https://delta.io logo
i

Ion

09/04/2023, 4:18 PM
So initially I thought my read execution time was slow due to my partitioning, but it's apparently due to delta-rs.. because passing all the file uris to pyarrow parquetdataset directly is 10x faster.
w

Will Jones

09/04/2023, 4:44 PM
hmm I need to look into this. It shouldn't be 10x slower than PyArrow. After all, it uses PyArrow's parquet readers under the hood
i

Ion

09/04/2023, 4:48 PM
w

Will Jones

09/04/2023, 4:49 PM
Are you also using azure?
Or is this reproducible on S3 or GCS?
i

Ion

09/04/2023, 4:49 PM
Our company only uses azure, so I don't know about the others
Also in localfilesystem, which I've checked now with smaller partitions Delta-RS is consistently slower and the gap grows a lot by the amount of partitions