Martin
04/27/2023, 4:33 PMdf.write
.format('delta')
.partitionBy(partition_col)
.mode('append')
.option('userMetadata', json.dumps(commit_info))
.option('mergeSchema', 'true')
.option('overwriteSchema', 'true')
.option("path", path)
.saveAsTable(tablename)
Immediately, we realized our mistake and we tried "roll-back" the operation by restoring a previous version of the Delta Table.
In the _delta_log
there were JSONs for versions 745, 746, 747, 748, 749, 750 present at that time.
We tried to restore version 749 (deltaTable.restoreToVersion(749)
). The operation failed, saying that only version(s) [750, 750] are available. Also, reading a version prior to 750 did not work with the same error message.
We did not perform any vacuums in the meantime.
Does anyone have an explanation what the reason for this might be?
I tried to recreate the situation with dummy data, but was not able to. In my experiments, I was always able to restore a previous version - even after a schema change.Gerhard Brueckl
04/27/2023, 5:24 PMMartin
04/27/2023, 6:06 PMDESCRIBE HISTORY
version timestamp
754 2023-04-26T08:02:25Z
753 2023-04-26T07:55:14Z
752 2023-04-26T07:54:48Z
751 2023-04-25T16:35:57Z
750 2023-04-25T12:22:13Z
749 2023-04-25T09:14:36Z
748 2023-04-05T19:40:14Z
747 2023-03-30T09:32:30Z
delta.logRetentionDuration
and delta.deletedFileRetentionDuration
are not listed when I run SHOW TBLPROPERTIES
. That means the default values are effective, correct?
We did not delete any data files manually. Can I check if they do exist programatically or do I need to look into a version JSON and look up the files manually?Gerhard Brueckl
04/27/2023, 6:10 PM<path>
VERSION AS OF 750` work?Martin
04/27/2023, 6:44 PMSELECT * FROM table VERSION AS OF 750 LIMIT 10;
does work
SELECT * FROM table VERSION AS OF 749 LIMIT 10;
does not work
We'd like to restore 749Gerhard Brueckl
04/27/2023, 6:45 PMMartin
04/27/2023, 6:45 PMError: Cannot time travel Delta table to version 749. Available versions: [750, 754].
Gerhard Brueckl
04/27/2023, 6:46 PMMartin
04/27/2023, 6:48 PMGerhard Brueckl
04/27/2023, 6:49 PMMartin
04/27/2023, 6:50 PMGerhard Brueckl
04/27/2023, 7:39 PMMartin
04/27/2023, 7:45 PMGerhard Brueckl
04/28/2023, 6:47 AMMartin
04/28/2023, 9:30 AM_dela_log
folder (for version 750).
The checkpoint for 740 (and version JSONs 740 - 746) were deleted since they were older than default delta.logRetentionDuration
= 30 days.
However, this also indirectly renders version 747-749 unreadable since prior checkpoint and the versions in between are required to assemble the version.
This makes total sense, but I'm asking myself now if the deletion of checkpoints and version JSON is working properly or if there is a design flaw.Gerhard Brueckl
04/28/2023, 9:45 AMScott Sandre (Delta Lake)
04/28/2023, 5:27 PMDue to log entry cleanup, instances can arise where you cannot time travel to a version that is less than the retention interval. Delta Lake requires all consecutive log entries since the previous checkpoint to time travel to a particular version. For example, with a table initially consisting of log entries for versions [0, 19] and a checkpoint at verison 10, if the log entry for version 0 is cleaned up, then you cannot time travel to versions [1, 9]. Increasing the table property delta.logRetentionDuration can help avoid these situations.
logRetentionDuration
.Martin
04/29/2023, 10:32 AM