So initially I thought my read execution time was slow due to my partitioning, but it's apparently due to delta-rs.. because passing all the file uris to pyarrow parquetdataset directly is 10x faster.
w
Will Jones
09/04/2023, 4:44 PM
hmm I need to look into this. It shouldn't be 10x slower than PyArrow. After all, it uses PyArrow's parquet readers under the hood
Our company only uses azure, so I don't know about the others
Also in localfilesystem, which I've checked now with smaller partitions Delta-RS is consistently slower and the gap grows a lot by the amount of partitions