https://delta.io logo
r

rtyler

01/11/2023, 12:17 AM
TIL about Column Invariants from @TD which are 100% not being looked at when the Rust writer does its writes 🤣 https://github.com/delta-io/delta/blob/master/PROTOCOL.md#column-invariants
w

Will Jones

01/11/2023, 12:48 AM
They are in some code paths, I thought
Also they are deprecated, because Spark doesn’t always respect them either
r

rtyler

01/11/2023, 12:50 AM
where is there any comment about deprecation?
w

Will Jones

01/11/2023, 12:50 AM
We have a function to check these, but I don’t think it’s used outside of the Python writer and the writer in the operations module https://github.com/delta-io/delta-rs/blob/d275c6a2cbbaf96ba5994dd24ad3c7420a9fe34c/rust/src/delta_datafusion.rs#L735
Well it’s not explicitly deprecated, but I’ve asked it to be
It’s not documented; like I had to really dig into order to figure out how to make a table that has column invariants in Spark
r

rtyler

01/11/2023, 12:52 AM
heh, well I'm on the fence on whether I want to explore using these more then
w

Will Jones

01/11/2023, 12:52 AM
Use constraints instead
That’s the thing that is favored over invariants
Though we don’t support them yet
It’s on my radar, but whenever I’ve asked users they always seem to prefer prioritizing more operations like UPDATE / DELETE / MERGE over supporting higher protocol versions
r

rtyler

01/11/2023, 12:58 AM
that makes sense to me
r

Robert

01/11/2023, 7:23 AM
we also use the InvariantChecker in a dedicated writer we use in the operations module. I kept a separate writer implementation to be able to experiment more without impacting too much of the code base. Eventually the current writer module and the one in the operations module should merge…
correction, the check does not happen in the writer itself, but rather in the write command before passing record batches to the writer… https://github.com/delta-io/delta-rs/blob/4f9dec2054cd8591d92b7a17d5c3ef1acc1c8068/rust/src/operations/write.rs#L363-L367
4 Views