I’d like to start reaching out to different communities and giving them a high level overview of the next steps for Python deltalake (new word for delta-rs) as it pertains their roadmaps. I’m pretty tight with the Polars and Dask communities and can start there. Here’s the messaging I’m thinking of at a high level:
“We’ve built a connector that makes it easy to read Delta tables into Dask DataFrames. This allows users to query data faster than regular Parquet data lakes because of transaction log level metadata skipping and avoiding inefficient file listing operations. We’re currently integrating the Arrow Database Connectivity (ADBC) library into delta-rs, which will allow us to build Dask writers and support for delete, merge, and update transactions more easily. We will let you know when the ADBC work is done so we can start working on the Dask writers/DML support.”
I’ll plan on sending Ritchie from Polars a similar message. Does this sound like a good plan?
03/03/2023, 4:13 PM
I think for now, maybe don’t need to explain the ADBC API before even a prototype exists. I’ve found it a little difficult to get people to understand it in the abstract.
So maybe just say “We’re working on new APIs to allow building Dask writers that would support delete, merge, and update transactions.”
03/03/2023, 7:45 PM
@Will Jones - that works for me. Will start pinging people. Thank you!