https://delta.io logo
g

Grainne B

01/23/2023, 12:55 AM
Hello all! The below issue of being able to write delta tables to s3 , has seemingly been resolved in https://github.com/delta-io/delta-rs/issues/911 , as fixed in https://github.com/delta-io/delta-rs/pull/893
Copy code
{PyDeltaTableError('Failed to read delta log object: Generic DeltaS3ObjectStore error: Atomic rename requires a LockClient for S3 backends. Either configure the LockClient, 
        or set AWS_S3_ALLOW_UNSAFE_RENAME=true to opt out of support for concurrent writers.')}
However, the fix seems to just be allowing to pass
AWS_S3_ALLOW_UNSAFE_RENAME
as an option. This can cause issues with concurrent writes, as a lock is not acquired before renaming. I was hoping that the solution would involve being able to set a lock on the bucket, and releasing the lock after a write was performed. Does the above solution mean that concurrent writes are not supported/advised with delta lake ? Keen to hear how others are approaching this issue 🙏
w

Will Jones

01/23/2023, 3:10 AM
Sorry this isn’t well documented right now. In the error message, “LockClient for S3 backends” refers to: 1. Set
AWS_S3_LOCKING_PROVIDER
to
"dynamodb"
. 2. Configure the dynamodb lock using the options defined here: https://github.com/delta-io/delta-rs/blob/b7ea1710381157ea7d3f023995cae2d87dad6a5c/dynamodb_lock/src/lib.rs#L61-L79
Does the above solution mean that concurrent writes are not supported/advised with delta lake ?
In delta-rs we support concurrent writes for all backends, but AWS is a special case that requires extra configuration, since S3 itself doesn’t have a way to atomically rename objects. (GCS, Azure Blob, and local filesystems all work.)
Our support for concurrent writes isn’t that resilient yet, though. They may sometimes fail right now instead of retrying, but we are working on improvements to the conflict resolver so that it should succeed, unless there are two concurrent updates that are totally incompatible (for example, overwriting the schema and appending new data with the old schema.)
(Created an issue to document this: https://github.com/delta-io/delta-rs/issues/1091)
🙌 1
7 Views