https://delta.io logo
m

Mike Eastham

07/11/2023, 7:48 PM
I apologize if this is answered somewhere else, but am I correct that the dynamodb lock client implemented in
delta-rs
uses a different locking protocol than the
S3DynamoDBLogStore
in the JVM implementation? So I couldn’t e.g. safely have a task using
delta-rs
and a Spark job updating the same table concurrently?
r

rtyler

07/11/2023, 7:49 PM
You are correct about the first part in that they rely on different semantics. We have a number of Spark jobs which interact with delta-rs written tables, but we just adding some similar dynamodb lock-respecting code to those Spark jobs
m

Mike Eastham

07/11/2023, 7:51 PM
got it, thank you
d

Dominique Brezinski

07/11/2023, 10:58 PM
What is the reason they are different implementations?
r

rtyler

07/11/2023, 11:03 PM
the implementation in delta-rs preceded the released
S3DynamoDbLogStore
by almost a year 😛 They're functionally a little different in that 🦀 is just using a leased lock but not really putting any data into Dynamo
d

Dominique Brezinski

07/11/2023, 11:06 PM
I recall, but would it not make sense to unify them now? Seems beneficial that delta-rs and Spark could safely operate on the same tables.
r

rtyler

07/11/2023, 11:07 PM
they can be unified, if you're looking to get your hands dirty @Dominique Brezinski I can pair with you to get started 😉
m

Mike Eastham

07/11/2023, 11:09 PM
do you have a ballpark guess of how much work it might be to support the Spark protocol? If it’s not a huge amount it’s something we could potentially look at doing
r

rtyler

07/11/2023, 11:12 PM
the way
S3DynamoDBLogStore
works I believe would be relatively straightforward to support so I wouldn't anticipate more than a week or two for somebody experienced with
ObjectStore
(since I think that's the layer this could be added)
@Mike Eastham who is "we"? 🙂
I wasn't being flip, I'm happy to pair with somebody to orient them, I've been working on refactoring some locking code lately
m

Mike Eastham

07/11/2023, 11:22 PM
“we” is the engineers at my company We’re not yet using
delta-rs
in production, but I’m evaluating it. Our use case involves interacting with tables that are written by Spark jobs using
S3DynamoDbLogStore
, so interoperability with that that doesn’t involve adding additional locking to the Spark jobs would be a pretty big plus. I’ll get in touch if it seems like it’s something we could devote some time to
👍 2
n

Nick Karpov

07/13/2023, 4:12 PM
@rtyler @Mike Eastham this is awesome, happy to contribute here too, here's the original design doc for the modifications required to both read/write paths to the log (https://docs.google.com/document/d/1Gs4ZsTH19lMxth4BSdwlWjUNR-XhKHicDvBjd2RqNd8/edit#heading=h.mjjuxw9mcz9h) there was also some prior efforts that may be worth to look at here https://github.com/delta-io/delta-rs/issues/1333 (i'm not sure if it's relevant any longer now that it may live in a separate repo)
👍 2
r

rtyler

07/13/2023, 4:45 PM
@Nick Karpov I think I see a path to compatible dyanmodb usage, I'm working at the moment to extract the dynamodb locking code from delta-rs anyways since it needs to move to make way for some other kernely work anyways
n

Nick Karpov

08/15/2023, 7:00 PM
btw rivian team making awesome progress on the conditional write approach, https://github.com/rivian/delta-go/pull/25/files#diff-f7ba0b4152fee30d7d7ac3260f9384f662075fcd95a448058ae75efb40c335cf 🥳 (cc @Rahul Madnawat) ... will be an awesome achievement when all 3 can safely interop on S3 😄
🙌 2
👀 1
r

Rahul Madnawat

08/15/2023, 7:04 PM
Super excited!!!