https://delta.io logo
r

rtyler

04/05/2023, 5:10 PM
@Yousry Mohamed I read through your blog post about doing writes with the deltalake crate. I'm curious how you feel about the ergonomics of using the
CreateBuilder
compared to some patterns we have floating around which use JSON for schema definition such as in kafka-delta-ingest
y

Yousry Mohamed

04/05/2023, 5:42 PM
@rtyler Both approaches have their pros and cons. I guess
CreateBuilder
has the benefit of being statically typed code supported by IDE checks and auto-completion specially if primitive data types are declared as enums or something. Both are testable but JSON has the benefit of treating the schema as data that can be stored in files or databases and sent around which is more flexible than the builder method. For people new to Rust, JSON is more readable as Rust code can be sometimes scary. But because I come from C# background, I still prefer the builder due to its fluent API nature.
r

rtyler

04/05/2023, 5:43 PM
I see ๐Ÿ™‚ Have you tried to create more complex columns with the builder syntax? I have found it to be quite difficult when creating maps and structs, but I might just not be smart enough ๐Ÿ™‚
y

Yousry Mohamed

04/05/2023, 5:50 PM
I havenโ€™t tried complex types but I agree it looks much more difficult. I will give it a go out of curiosity ๐Ÿ™‚
d

denny.g.lee

04/05/2023, 6:54 PM
Would be super interesting to find out - definitely let us know what you find, eh?!
๐Ÿ‘ 1
y

Yousry Mohamed

04/10/2023, 7:08 AM
It is definitely verbose but eventually a struct for example is just a group of fields so the needed bit is the envelope
Copy code
with_column(
    "complex",
    SchemaDataType::r#struct(SchemaTypeStruct::new(vec![
        SchemaField::new(
            "f1".to_string(),
            SchemaDataType::primitive(String::from("integer")),
            false,
            HashMap::new() as HashMap<String, Value>,
        ),
        SchemaField::new(
            "f2".to_string(),
            SchemaDataType::primitive(String::from("string")),
            false,
            HashMap::new() as HashMap<String, Value>,
        ),
    ])),
    false,
    Default::default(),
)
The above is for table creation (schema definition) side of things. Writing to the table using Rust follows a similar pattern but much more verbose. It includes adding the complex type to
ArrowSchema
as
DataType::Struct
and then feeding the data via a
StructArray
6 Views