https://delta.io logo
r

rtyler

03/25/2023, 8:04 PM
@Will Jones I'm hacking on top of your recently merged changes to arrow-rs, there's some API changes in arrow upstream that are incompatible with delta-rs, have you already started working on that , or should I continue cleaning up the compat in my branch?
w

Will Jones

03/25/2023, 8:04 PM
I haven’t. I was just planning on doing the upstream thing
r

rtyler

03/25/2023, 8:05 PM
ah okay, I will continue then 👍 . This move from Decoder to RawReader for JSON I think is definitely going to help some of the memory contention we have seen in #kafka-delta-ingest on handling checkpoints, but it may require some hackery to be compatible
I don't follow arrow closely enough, how soon do you think your changes will be pushed into a release?
w

Will Jones

03/25/2023, 8:06 PM
They just cut an RC right before, so it will probably take 2 -3 weeks for the next release
r

rtyler

03/25/2023, 8:06 PM
drats!
w

Will Jones

03/25/2023, 8:07 PM
Yeah unfortunate timing. We can try to create a new release of delta-rs as soon as that comes out
There’s some other changes we need to do on the schema to make map types work well in delta-rs
r

rtyler

03/25/2023, 8:09 PM
got tickets for reference? I'm curious, I hit this snag hard while working with map types for partition columns, and I was kind of shocked that nobody else had hit it already
w

Will Jones

03/25/2023, 9:11 PM
This might just be an issue with lists, but it might affect map too: √
r

rtyler

03/25/2023, 9:41 PM
@Will Jones have you worked with this
RawDecoder
interface much before? I've gotten stuck trying to figure out why
flush()
isn't causing it to hit the end of this buffer 😕
w

Will Jones

03/25/2023, 9:43 PM
I haven’t worked with JSON in arrow-rs yet
r

rtyler

03/25/2023, 9:43 PM
I have a thread to pull on 🤔
RawDecoder
has a decode() method, but it doesn't advance a reader or anything like that for you, so you have to manage how much of your buf has been read and bail out, otherwise it will re-decode over and over again
making good progress on the upgrade in this branch, there's some wackiness with errors coming out of serde_json but I'll have that resolved shortly
hah! I think I found a bug in our test code too! https://github.com/delta-io/delta-rs/blob/main/rust/src/writer/json.rs#L476- This is invalid according to the ArrowSchema we pass in, since
value
is supposed to be an
int32
🤔 The
Decoder
interface is.. forgiving of the schema but I think the RawDecoder may be stricter.
I will have an update to my pull request ready in about 15 minutes once I rewrite my failing test which should pass with the new arrow
pushed up to https://github.com/delta-io/delta-rs/pull/1226 the test I added is still failing, and I'm working to understand that now
there we go, pushed passing tests up
2 Views