https://delta.io logo
d

Dustin Salmons

08/24/2023, 9:00 PM
Time Travel Questions Let’s say I have a Gold dataset that is used for key business reporting, and I want to utilize the Time Travel functionality to compare YTD metrics from this time last year to this year. However, there was a problem with the dataset from last year, and a manual operation had to be performed on it a couple months later. It will no longer be viable to query that dataset with a timestamp of “this time last year” and it will require some sort of documentation or metadata to remember that. My questions: 1. Can we delete or vacuum a specific version? 2. Could we entertain the functionality of “naming” versions?
n

Nick Karpov

08/24/2023, 10:59 PM
you can vacuum up to a point (using
RETAIN
keyword like
VACUUM ... RETAIN 100 HOURS
) but I don't think that'll get you what you want... tagging/naming specific versions would be cool too, and I don't yet see an issue for it on github, could you make one? same for "point in time" vacuum... you may be able to do surgery on your table manually but I don't recommend it (and haven't thought through the implications myself) ... also check out
RESTORE
which can make an old snapshot current (again, not sure if it helps you but in your ballpark, maybe you'll find use)
d

Dustin Salmons

08/25/2023, 11:49 PM
Appreciate that Nick, I think I’ll look at logging those GitHub issues and then work through the code on a contribution
j

JosephK (exDatabricks)

08/26/2023, 12:01 AM
For naming, you might want to do a deep clone of the table?
👍 1
d

Dustin Salmons

08/26/2023, 12:08 AM
Yep, that’s a fair alternative. I think the benefit with this type of dataset I’m envisioning would be that there would not be duplicating of data.
3 Views