https://delta.io logo
n

Nagendra Darla

04/06/2023, 5:44 AM
Hi, I am following this document to verify multi cluster writes on Amazon EMR. https://delta.io/blog/2022-05-18-multi-cluster-writes-to-delta-lake-storage-in-s3/ I do not see either the table or record getting created in DynamoDB. Am I missing anything? Any inputs are much appreciated. Below are my configurations
Copy code
SparkSession spark = SparkSession.builder()
        .config("fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
        .config("fs.AbstractFileSystem.s3.impl", "org.apache.hadoop.fs.s3a.S3A")
        .config("fs.s3a.aws.credentials.provider", "com.amazonaws.auth.DefaultAWSCredentialsProviderChain")
        .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
        .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
        .config("spark.delta.logStore.s3.impl", "io.delta.storage.S3DynamoDBLogStore")
        .config("spark.io.delta.storage.S3DynamoDBLogStore.ddb.tableName", "delta_log")
        .config("spark.io.delta.storage.S3DynamoDBLogStore.ddb.region", "us-east-1")
        .config("spark.io.delta.storage.S3DynamoDBLogStore.credentials.provider",
            "com.amazonaws.auth.DefaultAWSCredentialsProviderChain")
        .getOrCreate();
2
Hi, any inputs on this please?
s

Scott Sandre (Delta Lake)

04/08/2023, 3:07 AM
Did you perform a write?
Did you see your delta table get created in S3?
n

Nagendra Darla

04/10/2023, 8:35 PM
Yes, I performed a write on s3 on a existing delta table with the above configurations. Data is getting written to s3 and I do not see a record created in DynamoDB.
Am I missing anything ?
I see the configs being used in spark history server. 1. spark.delta.logStore.s3.impl=io.delta.storage.S3DynamoDBLogStore 2. spark.io.delta.storage.S3DynamoDBLogStore.credentials.provider=com.amazonaws.auth.DefaultAWSCredentialsProviderChain 3. spark.io.delta.storage.S3DynamoDBLogStore.ddb.region=us-east-1 4. spark.io.delta.storage.S3DynamoDBLogStore.ddb.tableName=delta_log 5. spark.io.delta.storage.S3DynamoDBLogStore.provisionedThroughput.rcu=5 6. spark.io.delta.storage.S3DynamoDBLogStore.provisionedThroughput.wcu=5 7. spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog 8. spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension
Hi, Any inputs on this please?
s

Scott Sandre (Delta Lake)

04/11/2023, 9:09 PM
is the scheme of your files
s3
or
s3a
?
You may want to try
Copy code
.config("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
and
Copy code
spark.delta.logStore.s3a.impl=io.delta.storage.S3DynamoDBLogStore
n

Nagendra Darla

04/11/2023, 9:59 PM
Thanks, let me try.
We are using S3a
s

Scott Sandre (Delta Lake)

04/11/2023, 10:21 PM
Then you definitely need to use
.s3a.
instead of
.s3
in the confs above.
n

Nagendra Darla

04/12/2023, 8:02 AM
Thank you so much. I see the Dynamo DB records getting created after changing them to ‘s3a’
s

Scott Sandre (Delta Lake)

04/13/2023, 7:55 PM
@Nagendra Darla yay! Glad I could help!
👍 1
3 Views