https://delta.io logo
p

Priyal Chaudhari

03/16/2023, 3:03 PM
Hi Team, I am looking for an answer around can we write to a delta table when we are doing vacuum on it ? I got conflicting results in search and havent found a doc which says that we should write simultaneously or not. can anyone help me with a link where standard practice is recommended. we are trying to build a daily vacuum and optimize job and debate is can we write or do we have to pause streams to write to that table
j

JosephK (exDatabricks)

03/16/2023, 3:05 PM
you can write to a table and vacuum it at the same time.
Vacuum affects old files and writing doesn’t
👍 1
p

Priyal Chaudhari

03/16/2023, 3:06 PM
Interesting . Thank you for clarifying and what about optimize ? can we also write while optimizing delta table ?
j

JosephK (exDatabricks)

03/16/2023, 3:10 PM
That’s two write jobs, so they can conflict. You can optimize per partition or with a predicate and then should be safe
👍 1
p

Priyal Chaudhari

03/16/2023, 3:12 PM
ok perfect thank you
d

Dominique Brezinski

03/16/2023, 6:38 PM
Is your stream write an append or an update/merge? If it is the former then you likely will not see a transaction failure between optimize and the stream job. If one fails because the version gets bumped underneath it, it will be a fast retry due to no unreconcilable conflicts and succeed. However if it is an update/merge, the restructuring of files by optimize can create a transaction conflict that cannot be resolved without a complete retry. We stream append to tables that are being optimized all the time without failure.
p

Priyal Chaudhari

03/16/2023, 6:39 PM
ahh ok good to know our streams are update/merge all of them
so we will have to pause for optimize. Vacuum we may get away without pausing
d

Dominique Brezinski

03/16/2023, 6:41 PM
correct
we add a modulo N batchId clause in the foreachBatch to execute the optimize. that way we don’t have to muck with stopping the stream, optimize, and restart stream.
3 Views