Martin Beaussart04/19/2023, 9:01 AM
Hanan Shteingart04/19/2023, 9:17 AM
Martin Beaussart04/19/2023, 10:02 AM
So i think this will definitily solve my problem beceause as i understand, using Vacuum on delta table, will delete all the old data transaction files and only keep the ones that reflect the latest status of the data in the table, am i correct ? Also i am using PySpark and i saw this can be done as well on the docs, here is the link to the doc for anyone interested : https://docs.delta.io/latest/delta-utility.html#language-python:~:text=table%20to%20Delta-,Remov[…]0behavior%2C%20see%20Data%20retention.,-Important
VACUUM removes all files from the table directory that are not managed by Delta, as well as data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold.