I have a Delta table (version 2.0.1) almost 1.5 PB in size which has never been vacuumed. I was planning to perform cleanup to reduce storage footprint.
Would it be advisable to delete the un-referenced files manually (identified by vacuum dry run) since vacuum will block Delta-write operation for hours?
Thanks in advance!
06/30/2023, 8:18 PM
If it was advisable, why would they have created a vacuum command in the first place?
06/30/2023, 9:36 PM
vacuum is non-blocking and safe to use with active writes, as long as you don’t set retain option too short. I advise just running with the default, and you should be OK.
06/30/2023, 10:18 PM
Got misdirected about writes getting blocked from somewhere. Thanks @Dominique Brezinski@JosephK (exDatabricks)!
07/01/2023, 12:59 AM
@Chetan Joshi curious what kind of 1.5pb data files are these?
07/01/2023, 2:09 PM
He is saying the total table size is 1.5PB. I have 20PB tables of just log data.