https://delta.io logo
j

Jeremy Jordan

04/20/2023, 12:53 PM
Hi, I'm trying to implement multi-cluster setup so that I can have multiple EMR clusters writing to the same Delta table (in S3) at the same time. I followed all the instructions on that page and ran a couple parallel test jobs in EMR. I confirmed that locks were being added to the table (for entries in the transaction log). However, one of the jobs failed raising a
ConcurrentAppendException
error. Is this expected? Do I need to implement retries in my write logic? I figured that after enabling multi-cluster setup that a cluster would wait for the lock to be released and then write a new transaction, I didn't think I would have to worry about retries. Do I have something misconfigured?
Here's the log event that I see
Copy code
ConcurrentAppendException: Files were added to partition [event_date=2023-04-14] by a concurrent update. Please try the operation again.

Conflicting commit: 
{
    "timestamp": 1681939596227,
    "operation": "OPTIMIZE",
    "operationParameters": {
        "predicate": [
            "(event_date = '2023-04-14')"
        ],
        "zOrderBy": [
            "customer_id"
        ]
    },
    "readVersion": 5,
    "isolationLevel": "SnapshotIsolation",
    "isBlindAppend": false,
    "operationMetrics": {
        "numRemovedFiles": "...",
        "numRemovedBytes": "...",
        "p25FileSize": "...",
        "minFileSize": "...",
        "numAddedFiles": "...",
        "maxFileSize": "...",
        "p75FileSize": "...",
        "p50FileSize": "...",
        "numAddedBytes": "..."
    },
    "engineInfo": "Apache-Spark/3.2.0-amzn-0 Delta-Lake/2.0.0",
    "txnId": "..."
}
n

Nick Karpov

04/20/2023, 9:27 PM
Is this expected? Do I need to implement retries in my write logic?
yup, check the matrix here https://docs.delta.io/latest/concurrency-control.html#write-conflicts the multi cluster setup provides atomicity for the actual commit operation (not a lock over the entire transaction) for when two writers of the same commit (
00x.json
) occur at the exact same time. without this setup, one commit will overwrite the other, and the table will be corrupted
j

Jeremy Jordan

04/20/2023, 9:55 PM
gotcha, thanks for this information. is it a common pattern to establish some retry logic around writes for our jobs? so if we have multiple EMR clusters writing to the same delta table partition, some of them might raise a ConcurrentAppendException which we would catch and retry the operation with some sort of backoff?
73 Views