https://delta.io logo
g

Gerhard Brueckl

02/15/2023, 11:50 AM
as
VACUUM
deletes orphand and outdated files, has anyone ever had a case where • a long UPDATE/MERGE/DELETE operation was running that creates data files (parquet) •
VACUUM
is run at about the same time and cleans up those temporary orphaned files. Orphaned in a way that the log file has not yet been written as the concurrent UPDATE/MERGE/DELETE operation was still running?
a

abhijeet_naib

02/16/2023, 4:12 PM
have seen behavour like this
didn't thought it will delete the temporary orphaned files
@JosephK (exDatabricks), can you please help on this above query, as I have sometimes seen vacuum deleting more files than it marks as safe to delete
j

JosephK (exDatabricks)

02/16/2023, 4:32 PM
I’ve never seen or heard of it. Merge won’t tombstone files until it’s finished so vacuum won’t interact with it. Did you file a ticket? What system where you using?
g

Gerhard Brueckl

02/16/2023, 4:35 PM
Its just a question as of now. Technically, if data files are written but the corresponding transaction is not yet finished, VACUUM running at the same time could consider those files as orphaned as they dot not belong to a delta log entry yet as it was not yet written
j

JosephK (exDatabricks)

02/16/2023, 4:48 PM
Someone confirmed that it would clean up uncommitted files. Try not to vacuum retain 0 hours.
g

Gerhard Brueckl

02/16/2023, 4:56 PM
So the threshold/retentionPeriod is applied to both, outdated files based on the date in the delta log and orphaned files based on the creation/modification date?
j

JosephK (exDatabricks)

02/16/2023, 4:57 PM
Yes. Retain 1 hour should be safe, assuming you don’t have any writes or merges that take longer than an hou
3 Views