Martin Beaussart
04/19/2023, 9:01 AMHanan Shteingart
04/19/2023, 9:17 AMMartin Beaussart
04/19/2023, 10:02 AMVACUUM removes all files from the table directory that are not managed by Delta, as well as data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold.
So i think this will definitily solve my problem beceause as i understand, using Vacuum on delta table, will delete all the old data transaction files and only keep the ones that reflect the latest status of the data in the table, am i correct ?
Also i am using PySpark and i saw this can be done as well on the docs, here is the link to the doc for anyone interested : https://docs.delta.io/latest/delta-utility.html#language-python:~:text=table%20to%20Delta-,Remov[…]0behavior%2C%20see%20Data%20retention.,-Important