Rahul Sharma

01/14/2023, 11:29 AM
Hii All, when i Doing full full load and up both raw and refine streaming (raw fetch the data from kafka ,refine perform upsert on raw data) both job runs in parallel ,sometime i got below eror
Copy code
BaseException: The metadata of the Delta table has been changed by a concurrent update. Please try the operation again.
Conflicting commit: {"timestamp":1673694760672,"operation":"SET TBLPROPERTIES","operationParameters":{"properties":{"delta.logRetentionDuration":"interval 72 hours","delta.deletedFileRetentionDuration":"interval 72 hours","delta.compatibility.symlinkFormatManifest.enabled":"true"}},"readVersion":0,"isolationLevel":"Serializable","isBlindAppend":true,"operationMetrics":{},"engineInfo":"Apache-Spark/3.2.1-amzn-0 Delta-Lake/2.0.0","txnId":"6661060e-82a8-47a7-a7f3-95f61b67124a"}
can anyone let me know how to fix it


01/14/2023, 11:52 PM
Please create the table structure upfront and try to do these parallel writes

Ryan Johnson

01/17/2023, 5:22 PM
A transaction that sets table properties or other table metadata conflicts with every other transaction -- even blind appends. If your workload doesn't need to change table properties very often, then yes just make sure to do those changes first? Meanwhile, tho -- a simple retry loop is usually the best solution for sporadic transaction failure. There's only a problem to solve if the retries are failing consistently, or if the failures are frequent/expensive enough they cause missed SLA etc.