Borislav Blagoev
03/23/2023, 3:21 PMS3 Multi-Cluster Writes
in my Databricks Workspace. I have confirmed that I have read/write
permissions to the DynamoDB
table, and have successfully tested the permissions using boto3
in a Databricks notebook.
I have also set the Spark configs as per step 4 of the documentation. The runtime version on my cluster is 11.3 LTS
, and I have installed the following dependencies: io.delta:delta-storage-s3-dynamodb:2.1.0
and com.amazonaws:aws-java-sdk:1.12.427
.
I ran a script that reads a delta table and saves it to a different location, but there are no records in the DynamoDB
table. I'm wondering if I'm missing something or if there's a step I haven't completed.
I would appreciate any insights or advice on how to successfully enable S3 Multi-Cluster Writes
. Here is the link to the documentation that I'm following: https://delta.io/blog/2022-05-18-multi-cluster-writes-to-delta-lake-storage-in-s3/Nick Karpov
03/23/2023, 3:50 PMBorislav Blagoev
03/23/2023, 4:02 PMNick Karpov
03/23/2023, 4:04 PMBorislav Blagoev
03/23/2023, 4:06 PMNick Karpov
03/23/2023, 4:09 PMDatabricks and Delta Lake support multi-cluster writes by default, meaning that queries writing to a table from multiple clusters at the same time won’t corrupt the table. For Delta tables stored on S3, this guarantee is limited to a single Databricks workspace.
Borislav Blagoev
03/23/2023, 4:28 PMspark.databricks.delta.multiClusterWrites.enabled
config is set to True (by default). But is it possible to confirm somehow that this feature works?Nick Karpov
03/23/2023, 4:31 PMBorislav Blagoev
03/23/2023, 4:33 PMYousry Mohamed
03/23/2023, 10:23 PMDominique Brezinski
03/24/2023, 6:50 PM