We want to keep the bronze layer data for X days (just in case parsing is done incorrect) and then delete it. What is the preferred way of periodically deleting "old" data from a Delta Table? The tables in question are stored in an Azure Data Lake Storage Gen2 container. Should use an Azure Storage lifecycle rule to delete data and then have vacuum periodically clean up?
j
Jacek
07/04/2023, 11:44 AM
Unless a lifecycle rule understands the Delta Lake protocol (commands) I don’t think it’s the way to do. It should be
DELETE
every X days followed by
VACUUM
IMHO.
👆 1
k
kyrre
07/04/2023, 11:53 AM
That makes sense - thank you!
👍 1
d
Dominique Brezinski
07/04/2023, 3:47 PM
Or you can DELETE every X days and set a storage retention policy, but you need to make sure there is a safety margin baked in. In general VACUUM is the safest way to delete no longer referenced data files.