https://delta.io logo
g

guru moorthy

06/26/2023, 8:40 AM
Hi Team , will i get duplicate records if i re-ran partial failed merge command (some records have been updated or inserted) on the same target delta table but source delta lake gets updated each time. Assume the following scenario : Note: Each record contains the primary key based on which merge is performed. In Run-1, we have source S1 with 10 records ( 5- update , 5- insert ) and Target with 100 records after merge, target delta lake will contain 105 records. In Run-2, we have source S2 with 5 records ( 2-update , 3- insert) and Target with 105 records and merge have failed after writing partially ( 2 records have been inserted and 1 record have been updated ). As above Run-2 have failed, we re-ran the job again. But this time S2 (source delta table got updated) will contain 7 records ( 3-update,4-insert) which includes the previous 5 records (2- update , 3- insert) + current 2 records ( 1 -update, 1- insert ) . So if i do merge now on the target delta table, will it insert the already inserted records and create duplicates ? Any help on this issue ?
m

Michael Nacey

06/28/2023, 12:42 PM
You might want this: https://docs.delta.io/latest/delta-batch.html#id22 "Idempotent Writes"
g

guru moorthy

06/28/2023, 12:43 PM
But it's not supported with merge i guess.