W. Logan Downing

04/14/2023, 6:13 PM
Hey all! I'm new to the community but it's been cool to see all of the chat here! I've been working with delta lake / databricks over the past two years and stumbled across delta-rs about a year ago. For someone with no Rust experience (but a capable Python, R, SQL programmer), how would you recommend they get involved? I'm trying to wrap my mind around how all of this is put together and what ways I could contribute. For example, I've seen Arrow and DataFusion mentioned, is data fusion the key to to accomplishing a lot of what delta-rs wants to do? I would love to contribute to breaking delta free of being a spark only file format. Final note, I work in public health and the majority of our people use R as their primary language. I would love to help contribute to making it a reality to bring delta lake to our R users (whether that be through delta-rs or through duck db in the future).
ā¤ļø 1

Will Jones

04/14/2023, 6:20 PM
Hello! I think a good place to start is trying out the packages on your data and reporting any bugs or challenges in the API. For example, Iā€™d be curious to see what it looks like to Delta Lake data using reticulate and the Python deltalake package into R (does it work? is it harder than it should be?).
At some point we might start an R package for deltalake, especially once we have a DuckDB plugin (hopefully later this year). So down the road would like help testing and designing that.
šŸŽ‰ 1

W. Logan Downing

04/14/2023, 6:24 PM
Thanks Will, I'll give it a shot locally and see what happens! Looking forward to learning alongside the rest of you!
šŸ™Œ 3

Jim Hibbard

04/17/2023, 4:56 PM
I would love to see more native Delta Lake support for R too! R is so important for healthcare and life sciences. Every statistician and bio researcher uses it extensively and Delta Tables would dovetail really beautifully in their toolkit. Huge impact for medical researchers.