Zhang David
09/03/2023, 2:03 PMWill Jones
09/03/2023, 4:55 PMfrom deltalake import write_deltalake
import pyarrow.dataset as ds
dataset = ds.dataset("path/to/parquet", format="parquet")
reader = dataset.scanner().to_reader()
write_deltalake("path/to/deltatable", reader)
If I have created two delta tables, at two different local fs locations, how can I merge them into one via delta-rs python?What do you mean by merge? Would it be acceptable to read them both and write a new third one? Or are you saying you want to create a new Delta Table that references data files at those two existing locations?
Zhang David
09/03/2023, 9:51 PMWill Jones
09/03/2023, 11:36 PMZhang David
09/04/2023, 12:54 AMWill Jones
09/04/2023, 3:19 AMis it b/c of this open tktNo, it's just because S3 doesn't support any atomic replace if not exists or similar operation. For S3, to support concurrent writes you need some sort of external locking mechanism. delta-rs ships with one implemented using dynamodb. You should be able to set these environment variables to configure it: https://github.com/delta-incubator/dynamodb-lock-rs/blob/f4e21a81d0a39fc2c20d868479045062131e5aba/src/lib.rs#L66-L82
Zhang David
09/04/2023, 4:40 AMWill Jones
09/04/2023, 5:51 AMZhang David
09/04/2023, 11:27 AM