Slackbot
04/10/2023, 5:07 PMAlex Wilcoxson
04/10/2023, 7:25 PMSlackbot
04/10/2023, 9:16 PMRandy Sims
04/11/2023, 3:12 PMRandy Sims
04/11/2023, 3:15 PMChristina
04/11/2023, 10:45 PMHubert Kaczmarczyk
04/12/2023, 8:56 AMJeremy Jordan
04/12/2023, 5:55 PMParthiban Jaganathan
04/12/2023, 7:02 PMLeonard Shi
04/13/2023, 6:13 AMdelta-rs
directly?Martin
04/13/2023, 5:07 PMOPTIMIZE
) in order to keep the Delta table healthy.
We are currently leaning towards (1).
What are your thoughts?Satya Sai Ramnadh Bolisetti
04/13/2023, 5:56 PMVACUUM
command is taking 15 mns every day for a very small table. I even enabled delta vacuum parallel delete option
. Still no improvement. Anyone any suggestion to improve this?Radha Krishna Kanth Popuri
04/13/2023, 8:44 PMNitya Thacker
04/13/2023, 9:07 PMError - org.apache.spark.SparkUnsupportedOperationException: Cannot evaluate expression: rangepartitionid(input[14, string, true], 1000)
Versions - delta 2.1.0, spark version '3.3.0+amzn.1.dev0'
errors at - mytable.optimize().where("parititionfield=somevalue").executeZOrderBy("col1", "col2")
Felipe Pessoto
04/13/2023, 9:50 PMShubham Goyal
04/14/2023, 8:18 AMMichaël Gainhao
04/14/2023, 1:41 PMHemant Kumar
04/14/2023, 1:44 PMNitya Thacker
04/14/2023, 3:02 PMDeep Patel
04/15/2023, 11:13 PMDip
04/17/2023, 5:27 AMLeo
04/17/2023, 12:37 PMRajath Chandregowda
04/17/2023, 3:30 PMHarish Tallapaneni
04/17/2023, 9:29 PMdf = spark.read.jdbc("jdbc:db2//host.com:400/server", "db.table", properties=conn_prop : {'user': 'user', 'driver': 'com.ibm.db2.jcc.DB2Driver', 'customSchema': 'a smallint, b string, c int, d int', 'DELIMIDENT': 'Y', 'partition_column': 'd', 'num_partitions': '20', 'lower_bound': '2022-02-26', 'upper_bound': '2023-04-04', 'securityMechanism': '18', 'sslConnection': 'true', 'sslTrustStoreLocation': 'path', 'sslKeyStoreType': 'PKCS12', 'sslKeyStorePassword': 'xxxx', 'sslKeyStoreLocation': 'path_to_key', 'sslTrustStoreType': 'jks'})
Satyadeep Sinha
04/18/2023, 2:39 AMshingo
04/18/2023, 6:40 AMpredicateHints
parameter to load_as_pandas for now? Thanks.David Conner
04/18/2023, 8:17 AMmp
into lake
)
I'm looking at the partition discovery section of the Spark SQL guide (and a few other sources), but I can't quite figure out how my data will be treated when it's pulled in. the directory structure looks like this, except I've used a regex to change those directory names to participant_id=$id
. inside each directory, there are parquet files whether the name is $id.parquet, but they're not in a directory named sequence_id=$id
.
.
├── lake
└── mp
├── sign_to_prediction_index_map.json
├── train.csv
└── train_landmark_files
├── 16069
├── 18796
├── ...
├── 61333
└── 62590
is the directory required? or can i just rename the files to sequence_id=$id.parquet
?
I'm getting ready to try messing with a DLT using just one participant's data to see what happens. I haven't extracted any additional data, butDavid Conner
04/18/2023, 11:34 AMKilic Ali-Firat
04/18/2023, 11:40 AMLucas Zago
04/19/2023, 12:17 AM(select sum(case faturado.shkzg
when 'S' then faturado.menge
else - faturado.menge
end)
from sapsr3.ekbe faturado where faturado.ebeln = b.ebeln and b.ebelp = faturado.ebelp and faturado.vgabe = '1' -- entradas
) as qtd_faturada
Did not find any useful resource
If someone have some tip i will appreciate, thanks