https://delta.io logo
m

Martin

04/27/2023, 4:33 PM
Timetravel/restore not possible after append Accidentally, we performed an append operation on a Delta Table that also added two columns to the target table:
Copy code
df.write
  .format('delta')
  .partitionBy(partition_col)
  .mode('append')
  .option('userMetadata', json.dumps(commit_info))
  .option('mergeSchema', 'true')
  .option('overwriteSchema', 'true')
  .option("path", path)
  .saveAsTable(tablename)
Immediately, we realized our mistake and we tried "roll-back" the operation by restoring a previous version of the Delta Table. In the
_delta_log
there were JSONs for versions 745, 746, 747, 748, 749, 750 present at that time. We tried to restore version 749 (
deltaTable.restoreToVersion(749)
). The operation failed, saying that only version(s) [750, 750] are available. Also, reading a version prior to 750 did not work with the same error message. We did not perform any vacuums in the meantime. Does anyone have an explanation what the reason for this might be? I tried to recreate the situation with dummy data, but was not able to. In my experiments, I was always able to restore a previous version - even after a schema change.
g

Gerhard Brueckl

04/27/2023, 5:24 PM
How long ago were those versions created? Do the underlying data files still exist? Have a look at logRetention and dataRetention
m

Martin

04/27/2023, 6:06 PM
DESCRIBE HISTORY
Copy code
version    timestamp
754	       2023-04-26T08:02:25Z
753	       2023-04-26T07:55:14Z
752	       2023-04-26T07:54:48Z
751	       2023-04-25T16:35:57Z
750	       2023-04-25T12:22:13Z
749	       2023-04-25T09:14:36Z
748	       2023-04-05T19:40:14Z
747	       2023-03-30T09:32:30Z
delta.logRetentionDuration
and
delta.deletedFileRetentionDuration
are not listed when I run
SHOW TBLPROPERTIES
. That means the default values are effective, correct? We did not delete any data files manually. Can I check if they do exist programatically or do I need to look into a version JSON and look up the files manually?
g

Gerhard Brueckl

04/27/2023, 6:10 PM
does a `SELECT * FROM delta.
<path>
VERSION AS OF 750` work?
m

Martin

04/27/2023, 6:44 PM
SELECT * FROM table VERSION AS OF 750 LIMIT 10;
does work
SELECT * FROM table VERSION AS OF 749 LIMIT 10;
does not work We'd like to restore 749
g

Gerhard Brueckl

04/27/2023, 6:45 PM
I guess you receive the same error message?
m

Martin

04/27/2023, 6:45 PM
yes
Error: Cannot time travel Delta table to version 749. Available versions: [750, 754].
g

Gerhard Brueckl

04/27/2023, 6:46 PM
what about 751 ?!
m

Martin

04/27/2023, 6:48 PM
751 works fine 750 was the accidental append with schema change. One more thing: we ran the operions prior 750 on Spark 3.2 & Delta 1.2 cluster; the accidental append was Spark 3.3 & Delta 2.2. Could this have "damaged" the history?
g

Gerhard Brueckl

04/27/2023, 6:49 PM
well it could
can you try to read those versions with the old setup?
m

Martin

04/27/2023, 6:50 PM
I think we tried; but I'll double check
no luck with the old setup 😟 same error messages
g

Gerhard Brueckl

04/27/2023, 7:39 PM
hmm, weird
did you change minReader/minWriter table properties?
m

Martin

04/27/2023, 7:45 PM
it seems that history broke between 749 and 750. Since we were not able to restore 749, we descided to raise minReader maxReader (that is version 752) in order to be able to fix the table manually and drop the wrongfully added columns manually
g

Gerhard Brueckl

04/28/2023, 6:47 AM
I would probably go to the delta-log folder and check that specific version manually as it is dividable by 10 I guess it also created a checkpoint - maybe something happened there - but actually then also 751 would not work anymore 🤔
m

Martin

04/28/2023, 9:30 AM
I do have a theory now: There is only one checkpoint present in my
_dela_log
folder (for version 750). The checkpoint for 740 (and version JSONs 740 - 746) were deleted since they were older than default
delta.logRetentionDuration
= 30 days. However, this also indirectly renders version 747-749 unreadable since prior checkpoint and the versions in between are required to assemble the version. This makes total sense, but I'm asking myself now if the deletion of checkpoints and version JSON is working properly or if there is a design flaw.
so in my scenario not the missing data files but missing log files / checkpoint lead to the situation that time travel was prevented
g

Gerhard Brueckl

04/28/2023, 9:45 AM
hmm, good observation and findings a checkpoint should only be deleted if all other versions that rely on it are also already deleted not sure if there is a special logic for that in the current code @Scott Sandre (Delta Lake) ?
👍 1
s

Scott Sandre (Delta Lake)

04/28/2023, 5:27 PM
Hey all! Yup this is a known scenario. Cleaning up the log can break some time travel queries. https://docs.delta.io/latest/delta-batch.html#-data-retention
Copy code
Due to log entry cleanup, instances can arise where you cannot time travel to a version that is less than the retention interval. Delta Lake requires all consecutive log entries since the previous checkpoint to time travel to a particular version. For example, with a table initially consisting of log entries for versions [0, 19] and a checkpoint at verison 10, if the log entry for version 0 is cleaned up, then you cannot time travel to versions [1, 9]. Increasing the table property delta.logRetentionDuration can help avoid these situations.
The only solution would be to not delete those log files .... but that's the entire point of
logRetentionDuration
.
Although I agree that it would be nice to have the semantic of: only delete these log files if they don't break any time travel queries
👍 1
m

Martin

04/29/2023, 10:32 AM
I created a feature request for this: https://github.com/delta-io/delta/issues/1728