https://delta.io logo
l

Luan Carvalho

05/27/2023, 8:37 PM
Hello Delta rs community! I am planning to use delta-rs to write to Delta tables stored in S3 or on Databricks. Is this possible? If yes, how? I've read the documentation and couldn't find a topic making this clear.
w

Will Jones

05/27/2023, 8:59 PM
Yes, we support S3, as well as Azure blob store and GCS. https://delta-io.github.io/delta-rs/python/usage.html#loading-a-delta-table
l

Luan Carvalho

05/27/2023, 9:17 PM
Awesome ! One more question, does delta rs support multiple writers to the same table stored in s3 ?
w

Will Jones

05/27/2023, 9:49 PM
No, not without special configuration.
r

rtyler

05/27/2023, 10:06 PM
@Luan Carvalho Databricks operates what's commonly known as the S3 Commit Service inside a given workspace. This allows for concurrent writers by DBR-based applications. With delta-rs we have some DynamoDB-based locking code which can be used, but that will not guarantee safe concurrent writes from an unmodified Rust and unmodified Spark-based writer. The pattern that I take is that tables which Rust writers will write to are _only _ written to by Rust writers. Except some for special scenarios where I make my Spark code use the same DynamoDB lock semantics
l

Luan Carvalho

05/27/2023, 10:16 PM
When you mentioned dbr based applications you are talking about Spark application inside databricks environment or delta rs inside databricks ?
r

rtyler

05/27/2023, 10:22 PM
Spark via DBR, there is no version of delta-rs code which can utilize the Databricks proprietary commit service
l

Luan Carvalho

05/28/2023, 12:59 AM
Awesome guys
So, I want to write a customer query listener in my Spark streaming and use delta rs to write all the streaming progress objects to a data table. If I want to write all streaming queries for the same table, I need to use the dynamo db lock, right ?
r

rtyler

05/28/2023, 2:48 AM
readers need not concern themselves with the lock at all, but if you have multiple threads/processes attempting to write transactions independently, those should use the dynamodb lock, but that's built-in to the writer code already in delta-rs so all you need to do is configure the dynamodb table appropriately