Hi all, there are some best practice about how long to store data in delta lake ? Given a table you have updates, insertions, deletes every day in it.
There are some guidance about to handle it ?
03/27/2023, 6:07 PM
There's a lot that is subjective. In our case we look at the cost of storage and the use-case and set up automatic
jobs that will trim older deletes every now and again (like once a month/week)
03/27/2023, 9:49 PM
It depends on your needs to time travel vs storage costs. The more you vacuum, the more you save on storage, but the less your ability to time travel. So like Tyler mentioned, it really depends on your specific circumstances (and them might be different for different tables). I wrote a blog post on vacuum that you might find useful: https://delta.io/blog/2023-01-03-delta-lake-vacuum-command/