https://delta.io logo
y

Yi Wang

06/15/2023, 8:51 AM
Hi all, In this link: https://github.com/delta-io/delta/blob/master/PROTOCOL.md , it contains the log specification. But it keeps being updated. So in the delta lake connector code, I am wondering which version of the log protocol is used? I am asking this because I am trying to implement the delta lake transaction log in Go and the standalone connector is the main source to learn. Thanks in advance!
r

rtyler

06/15/2023, 12:44 PM
The Delta Spark implementation, standalone connector, and #delta-rs all follow that protocol. We are working however on a JVM and Rust (Native) "kernel" layer which can be used by all connectors to keep up to date more easily with the transaction log and protocol changes
y

Yi Wang

06/15/2023, 12:47 PM
because in the latest PROTOCOL.md, i see there is a field called
rowIdHighWaterMark
but it is not in the standalone connector code. So i am wondering which version it follows
so if i understood correctly, in the standalone connector code, it does not use the latest protocol in https://github.com/delta-io/delta/blob/master/PROTOCOL.md
r

rtyler

06/15/2023, 12:56 PM
I would say Delta for Spark is always the most up to date, followed by both the standalone connector and delta-rs
y

Yi Wang

06/15/2023, 1:00 PM
ok, thanks for telling. Because I see different tags for the md file, so i am a bit confused
m

Matthew Powers

06/15/2023, 1:12 PM
delta-io/delta is a reference implementation of the Delta Lake transaction log protocol. If you ever notice a discrepancy, feel free to open an issue. We really appreciate you digging in here, looking at the details, and building this code. Thanks for making these awesome contributions.
👍 1
r

rtyler

06/15/2023, 1:24 PM
Somewhat related to this topic is https://github.com/delta-io/delta/issues/1783 There's some work that we're hoping to mature beyond "simple prototype" stage by Data and AI Summit in a couple weeks for open sourcing it
y

Yi Wang

06/15/2023, 1:44 PM
wow, this is nice!