https://delta.io logo
c

Chetan Joshi

06/30/2023, 8:14 PM
Hello Team, I have a Delta table (version 2.0.1) almost 1.5 PB in size which has never been vacuumed. I was planning to perform cleanup to reduce storage footprint. Would it be advisable to delete the un-referenced files manually (identified by vacuum dry run) since vacuum will block Delta-write operation for hours? Thanks in advance!
j

JosephK (exDatabricks)

06/30/2023, 8:18 PM
If it was advisable, why would they have created a vacuum command in the first place?
d

Dominique Brezinski

06/30/2023, 9:36 PM
vacuum is non-blocking and safe to use with active writes, as long as you don’t set retain option too short. I advise just running with the default, and you should be OK.
c

Chetan Joshi

06/30/2023, 10:18 PM
Got misdirected about writes getting blocked from somewhere. Thanks @Dominique Brezinski @JosephK (exDatabricks)!
s

sudo

07/01/2023, 12:59 AM
@Chetan Joshi curious what kind of 1.5pb data files are these?
d

Dominique Brezinski

07/01/2023, 2:09 PM
He is saying the total table size is 1.5PB. I have 20PB tables of just log data.
👍 1
s

sudo

07/01/2023, 4:01 PM
Thanks