https://delta.io logo
h

Hugo Saavedra

07/28/2023, 8:09 PM
just a few questions about contributing to this project: 1. would it be problematic for me to start thinking about how https://github.com/delta-io/delta-rs/issues/930 might be implemented? or are there issues blocking it/design issues that need to be worked out still? 2. do folks typically look to the scala/spark delta implementation to maintain parity and guide implementation in
delta-rs
or is there more of a tendency to "greenfield" things to the extent that it makes sense, and only look to satisfy the letter of the spec rather than parity with other implementations? 3. do these tuesday meets still happen? https://github.com/delta-io/delta-rs#development-meeting
w

Will Jones

07/28/2023, 8:22 PM
1. or are there issues blocking it/design issues that need to be worked out still?
For Rust, no blockers right now. For Python, we need to refactor the API a bit first. I'll create an issue describing that in a little bit.
2. do folks typically look to the scala/spark delta implementation to maintain parity and guide implementation
We don't try to imitate their APIs or implementation, but they are good inspiration for test cases, since they've seen quite a few real world bug reports. There is a project called Data Acceptance Tests (DAT) where we are trying to collect test cases that all connectors can use. But it's still a bit early.
I'd also mention we are starting to work on delta-kernel-rs (repo is private for now, since it's in an early design stage), which which delta-rs will be refactored to use later. So much of our efforts to support higher protocol versions is going into that project rather than delta-rs itself. That being said, in the near term it may make sense to add new features to delta-rs itself.
3. do these tuesday meets still happen?
I think we recently stopped these for now in favor of a different meeting for delta-kernel-rs.
@Hugo Saavedra are you looking to implement column mapping for Rust or for Python?
h

Hugo Saavedra

07/28/2023, 9:00 PM
Thanks @Will Jones, that is good to know about delta-kernel-rs. I'll keep an eye out for that project and take a look at DAT. RE column mappings, I was thinking Rust but I'm motivated mainly by wanting to learn more about the delta format and delta internals, so I'm completely open to do the work for Python if that's preferred or higher priority. my understanding was that Python was bindings-only, or at least that it depended on/was blocked by
delta-rs
?
w

Will Jones

07/28/2023, 9:36 PM
I was thinking Rust but I'm motivated mainly by wanting to learn more about the delta format and delta internals
Then I'd say go right ahead. There's no blockers there at the moment.
my understanding was that Python was bindings-only
Python uses the Rust implementation for interacting with the delta log, but uses scanners and writers from PyArrow for interacting with the data files. This is because when we started the Parquet scanners and writers in Rust weren't very mature (didn't have support for things like predicate pushdown or partitioning written data). We'll be refactoring it soon to switch over to wrapping Rust for data file interaction too.
So if you implement column mapping support in Rust, then once we refactor Python it will also get column mapping support 🙂
h

Hugo Saavedra

07/28/2023, 9:42 PM
awesome. thank you! i'll leave a note on the issue that i'm working on it and check in if/when I get stuck or have questions
🙌 1
i

Ion

07/29/2023, 12:09 AM
What things will be solved with delta-kernel? Any issues specially for the Delta-RS epo?
w

Will Jones

07/29/2023, 4:46 PM
The benefit of delta kernel is the same for all connectors: it will make it easier for us to keep up with Spark as they add new features to the protocol
👍 1