https://delta.io logo
j

John Darrington

07/16/2023, 6:44 PM
So just a quick question - why don't we have anything on the main documentation page for how to write to delta tables? Like I'm trying to figure it out and just been scattered across the repo. From what I gather I build a
DeltaWriter
- write, then close it to get the
Add
actions which I then have to commit myself?
πŸ‘€ 1
as it states here - https://docs.rs/deltalake/latest/deltalake/writer/record_batch/index.html - just feels like maybe having it also on the front page makes sense
r

Robert

07/16/2023, 8:57 PM
We actually have two implementaions of the RecordBatchWriter, or rather we have
operations::writer::DeltaWriter
as well. Its been on my todo list for a while re consolidate those. This is an artifact from introducing conflict resolution without breaking downstream applications.
100% agree that this needs to be better discoverable - and we should finally converge on one πŸ™‚
w

Will Jones

07/16/2023, 9:07 PM
I'd add that we probably have two different write use cases: one is users who have streams of rows (like the kafka ingest project) and the other who have columnar batches (from polars or datafusion or elsewhere). So we should be able to guide each user type to the right interface.
As an aside: a while ago I started to look at improving our rust docs, but found that our module structure was such a mess that I felt like we should fix that up first. Created https://github.com/delta-io/delta-rs/issues/1136 . I think that’s nearly done.
j

John Darrington

07/16/2023, 10:53 PM
Lol I feel like each time I ask a question I open a can of worms, in a good way. First the locking store and now documentation :D
w

Will Jones

07/16/2023, 11:01 PM
ha sorry if I make it feel more complicated than it is. I just like to provide additional context when I can πŸ™‚
j

John Darrington

07/16/2023, 11:02 PM
I do the exact same thing so I get it
w

Will Jones

07/16/2023, 11:02 PM
Any incremental improvement is very welcome
m

Matthew Powers

07/17/2023, 2:38 PM
@John Darrington - thanks for the great questions and horray for the culture of constant incremental improvements.
j

John Darrington

07/17/2023, 4:30 PM
Of course. I’ll see what I can do to help with documentation
r

Robert

07/17/2023, 8:41 PM
we also still have https://github.com/delta-io/delta-rs/pull/1434 open, which implements most of what Wills issue mentions. Specifically it tries to isolate future kernel stuff a bit more. Would be happy to re-open it, or cherry-pick some re-orgs base on your feedback.
r

rtyler

07/17/2023, 8:42 PM
@Robert I was actually looking at that before I cut the last rust release, I wasn't certain whether I should try to resolve a bunch of those conflicts or not. I feel really bad you did that work right before summit and everything got stale really quick πŸ˜•
r

Robert

07/17/2023, 8:45 PM
No worries πŸ™‚, just did some merging with main and that was quite quickly resolved ...
🀘 1
I'd add that we probably have two different write use cases: one is users who have streams of rows (like the kafka ingest project) and the other who have columnar batches (from polars or datafusion or elsewhere). So we should be able to guide each user type to the right interface.
Fully agree. I think Kafka ingest only uses the JSON writer (right @rtyler?), Right now its still not quite a streaming API for writing the json batches. AFAIK, tyler recently moved to the new json decoder APIs already, which can quite readily be made to process streams - just did something similar in kernel πŸ™‚.
πŸ‘ 1