https://delta.io logo
r

rtyler

03/16/2023, 2:30 AM
Anybody happen to have example Rust code floating around which writes a String,String map? I am having a hell of a time battling arrow's wacky types
Appealing for some help from the Arrow folks, I think I have probably tried every possible way of constructing maps in Arrow and no luck thus far https://github.com/apache/arrow-rs/issues/3875 🙀
s

Siva

03/16/2023, 3:51 PM
Not sure if this helps. can you check https://github.com/apache/arrow-rs/blob/master/arrow-integration-test/src/schema.rs, field
c37
is a map type.
r

rtyler

03/16/2023, 3:55 PM
I had not seen that source code, thanks for the link. What's interesting is that the json Decoder doesn't appear to produce that data type when parsing a JSON buffer with a map in it. The
c37
field does look like what the
MapBuilder
produces however.
Neither I could get the writer to write unfortunately 🙀
I'm making some interesting progress, I think I might have found a bug in how we handle partition columns. Basically writes work with no partition columns of this data which has a map in it 🤦
Yes, this is looking more and more like our bug. A table with partition columns cannot be written if it contains a map type because a
take()
call is being issued when it shouldn't be
now, why this is a runtime issue rather than a compile-time issue is perplexing, because 🦀
The problem is that
take
isn't implemented for
MapArray
here: https://github.com/delta-io/delta-rs/blame/main/rust/src/writer/record_batch.rs#L400 Discussing this a bit elsewhere with @Christian Williams, and we definitely have workloads using maps with partition columns, so this must be a regression between the old writer code which #kafka-delta-ingest is using and the latest
HEAD
of delta-rs
hm, the
JsonWriter
code behaves differently than the
RecordBatchWriter
, 🤔
Introducing a failing test: https://github.com/delta-io/delta-rs/pull/1226 Still figuring out what can be done to support this behavior though
👍 3
2 Views