https://delta.io logo
w

Will Jones

05/09/2023, 11:21 PM
I was extra caffeinated 🤓 this afternoon, so, as promised, here is a document detailing how PyArrow datasets work with deltalake: https://docs.google.com/document/d/1XGg1pf9Nep9GHlSdvO65Ao1kyQ_Z_g55uyHuTYVyeT0/edit?usp=sharing cc@Matthew Powers
👍 3
👀 1
Also realized that while DuckDB and DataFusion support predicate pushdown into PyArrow datasets, Polars doesn’t yet. I commented on an issue in Polars to prod them on this.
m

Matthew Powers

05/10/2023, 10:06 PM
@Will Jones - this is awesome content!! Well done!
Where would you like to get this published? Can I help at all?
w

Will Jones

05/10/2023, 10:21 PM
I think I’ll probably split into two parts: 1. the “what are PyArrow Datasets?” and “how can query engines integrate through PyArrow datasets?” should become a new page on our python deltalake documentation. 2. The rest I might make into a technical blog post on my personal blog, since it’s mostly interesting implementation details (which could change in the future)
m

Matthew Powers

05/11/2023, 12:31 AM
I like that plan