https://delta.io logo
b

Borislav Blagoev

03/23/2023, 3:21 PM
Hello everyone, I'm working on enabling
S3 Multi-Cluster Writes
in my Databricks Workspace. I have confirmed that I have
read/write
permissions to the
DynamoDB
table, and have successfully tested the permissions using
boto3
in a Databricks notebook. I have also set the Spark configs as per step 4 of the documentation. The runtime version on my cluster is
11.3 LTS
, and I have installed the following dependencies:
io.delta:delta-storage-s3-dynamodb:2.1.0
and
com.amazonaws:aws-java-sdk:1.12.427
. I ran a script that reads a delta table and saves it to a different location, but there are no records in the
DynamoDB
table. I'm wondering if I'm missing something or if there's a step I haven't completed. I would appreciate any insights or advice on how to successfully enable
S3 Multi-Cluster Writes
. Here is the link to the documentation that I'm following: https://delta.io/blog/2022-05-18-multi-cluster-writes-to-delta-lake-storage-in-s3/
n

Nick Karpov

03/23/2023, 3:50 PM
multi cluster writes within databricks are managed for you - supported out of the box
b

Borislav Blagoev

03/23/2023, 4:02 PM
Even on S3?
n

Nick Karpov

03/23/2023, 4:04 PM
yes, S3 in particular, because the other storages don't require any additional services
b

Borislav Blagoev

03/23/2023, 4:06 PM
Is there a default table in DynamoDB that I can check?
I'm looking for a way to confirm that this functionality work
n

Nick Karpov

03/23/2023, 4:09 PM
no there's no dynamo table within databricks https://docs.databricks.com/delta/s3-limitations.html
Databricks and Delta Lake support multi-cluster writes by default, meaning that queries writing to a table from multiple clusters at the same time won’t corrupt the table. For Delta tables stored on S3, this guarantee is limited to a single Databricks workspace.
👀 1
b

Borislav Blagoev

03/23/2023, 4:28 PM
Two questions: 1. Do you know what is the purpose of this documentation if this feature came out of the box https://delta.io/blog/2022-05-18-multi-cluster-writes-to-delta-lake-storage-in-s3/? 2. It looks like
spark.databricks.delta.multiClusterWrites.enabled
config is set to True (by default). But is it possible to confirm somehow that this feature works?
n

Nick Karpov

03/23/2023, 4:31 PM
1. purpose is to support multi cluster writes on S3 outside databricks (wouldn't be a great experience to ask customers to stand up their own service!) 2. it's a managed service so that you don't have to worry about it, the question should be in reverse, can you confirm the feature is not working? if you're encountering a bug or unexpected error then please contact databricks support and let them know!
b

Borislav Blagoev

03/23/2023, 4:33 PM
I have faced issues with that feature. So, I will definitely contact databricks support. Thanks for the help!
🙌 1
I have one additional question. We have a specific situation where Databricks and other non-databricks jobs are writing to the same data table. So, how can we configure the S3 Multi-Cluster Writes to work with both? As we tried to enable that on databricks but it doesn't work.
y

Yousry Mohamed

03/23/2023, 10:23 PM
Not an answer to the last message but sharing the name of the Databricks provided service that handles multi-cluster writes locking and coordination. I found it randomly when I was investigating the same topic. The service has some configuration options as well. https://docs.databricks.com/administration-guide/cloud-configurations/aws/s3-commit-service.html
gratitude thank you 1
d

Dominique Brezinski

03/24/2023, 6:50 PM
I don't think multi-cluster writes is supported between Databricks clusters and writes from outside databricks. The locking mechanism is different between the Databricks product and the open source Delta Lake implementation. Multi-cluster writes in Databricks have been supported much longer and is well battle-tested. We have be using the feature from the very beginning without issue.
But Databricks support is the best place to ask for your issue.