https://delta.io logo
t

Theo LEBRUN

04/30/2023, 11:55 PM
Hey there, I’m facing an interesting problem where my end-users will often rewrite a partition with the same data multiple times per day. The problem is that the transaction log will get “flooded” with writes while having the same data in the table. End-users are using notebooks, one code cell writes the data and will get often executed because of rerunning the whole notebook. I wonder if there is a “smart” way to avoid writing data when it’s the same beside comparing the data row by row (which sound a bit complex)… Let me know if anyone has any ideas or tips, thank you!
y

Yousry Mohamed

05/01/2023, 1:05 AM
Check idempotent writes feature, I have not tried it for writes on single partitions but should work fine. If you can find a valid combination of appId and txnVersion, that would prevent multiple identical writes. https://docs.databricks.com/delta/idempotent-writes.html https://towardsdatascience.com/idempotent-writes-to-delta-lake-tables-96f49addd4aa
t

Theo LEBRUN

05/01/2023, 1:25 AM
Awesome, thanks!
c

Christopher Grant

05/01/2023, 12:25 PM
can
MERGE
be used instead? partition overwrites come from a time where you couldn't do row-level changes - and now you can.
t

Theo LEBRUN

05/01/2023, 12:41 PM
Merge can be used but it will create a new version in the transaction log, I want to avoid that if the data is the same. I don’t want to see a lot of identical versions when describing the history of a table.
c

Christopher Grant

05/01/2023, 5:55 PM
it's true,
MERGE
without any changes to the data will still generate a new transaction. would it be satisfactory if you could filter out these non-changing transactions from the history?
t

Theo LEBRUN

05/01/2023, 7:43 PM
Yeah they can be filtered out when reading the history based on operationMetrics
c

Christopher Grant

05/01/2023, 9:13 PM
i think MERGE is better then? you can just filter out the "dud" commits. if you do a partition overwrite, it's harder to differentiate commits that actually did something with commits that just overwrote but didn't change anything
7 Views