https://delta.io logo
h

Hanan Shteingart

01/10/2023, 7:13 AM
I have an ETL process which adds files to a s3 bucket of parquet files. I would like to create a delta table which is up to date with these files. I do not want to use convert every time to the whole path (might take too long). Ideally, I would like to update the delta table as new files are arriving. However, the documentation states that:
After the table is converted, make sure all writes go through Delta Lake.
So, my question is, how do I add to delta new add parquet files which already are in the delta location?
👀 1
c

Chris

01/10/2023, 7:14 AM
are you using Databricks? Autoloader might be your friend…
👍 1
h

Hanan Shteingart

01/10/2023, 7:15 AM
Kubeflow
y

Yousry Mohamed

01/10/2023, 10:16 AM
Options like delta-standalone or delta-rs could be considered here. https://delta-io.github.io/delta-rs/python/index.html
h

Hanan Shteingart

01/15/2023, 5:37 PM
Thanks @Chris - they call it autoloader but it's just SparkStreaming I think. However I could not find a way to pass to it the Azure credentials.
4 Views