https://delta.io logo
g

GapyNi

05/18/2023, 1:15 PM
Hi all, we want to move streaming delta tables from one directory into another. We have used Deep Clone, however it seems that Deep Clone creates last checkpoint in the SAME folder as Delta table and we have defined checkpoints in some different folder. The error we are getting is "Delta table ... does not exist, Please delete your streaming query checkpoint and restart".... Should we rather not use Deep Clone if we have Checkpoints defined in some different folder or do we have to delete the existing Checkpoints (in the separate folder)? Thanks and regards, GapyNi
c

Christopher Grant

05/18/2023, 6:06 PM
the recommended approach is to use a new checkpoint and specify the oldest unprocessed version as
startingVersion
in the readstream options. for example, if your old stream has processed all the way through version 128, your new stream should start on version 129.
spark.readStream.option("startingVersion", 129)...
g

GapyNi

05/18/2023, 8:21 PM
I have experimenting a bit with it: so if i do a deep clone and copy over the checkpoint and i initiate the stream with "append" it reads correctly the copied over _checkpoint folder and it start from where it was left. In the next stream we have than ForEachBatch with "`update`" and this one was failing - so we deleted the _checkpoint folder and now it works. Yes we are using
max(version) from (describe history ....)
, meaning we are getting the latest available stream.
2 Views