Hey all! I'm new to the community but it's been cool to see all of the chat here! I've been working with delta lake / databricks over the past two years and stumbled across delta-rs about a year ago. For someone with no Rust experience (but a capable Python, R, SQL programmer), how would you recommend they get involved? I'm trying to wrap my mind around how all of this is put together and what ways I could contribute.
For example, I've seen Arrow and DataFusion mentioned, is data fusion the key to to accomplishing a lot of what delta-rs wants to do? I would love to contribute to breaking delta free of being a spark only file format.
Final note, I work in public health and the majority of our people use R as their primary language. I would love to help contribute to making it a reality to bring delta lake to our R users (whether that be through delta-rs or through duck db in the future).
04/14/2023, 6:20 PM
Hello! I think a good place to start is trying out the packages on your data and reporting any bugs or challenges in the API. For example, I’d be curious to see what it looks like to Delta Lake data using reticulate and the Python deltalake package into R (does it work? is it harder than it should be?).
At some point we might start an R package for deltalake, especially once we have a DuckDB plugin (hopefully later this year). So down the road would like help testing and designing that.
W. Logan Downing
04/14/2023, 6:24 PM
Thanks Will, I'll give it a shot locally and see what happens! Looking forward to learning alongside the rest of you!
04/17/2023, 4:56 PM
I would love to see more native Delta Lake support for R too! R is so important for healthcare and life sciences. Every statistician and bio researcher uses it extensively and Delta Tables would dovetail really beautifully in their toolkit. Huge impact for medical researchers.