https://delta.io logo
r

Rahul Sharma

02/08/2023, 5:12 AM
Hello Team ,i have schedule a Deltalake job raw and refine 10 days ago and i have set delta file and log retention duration is 5 days ,i am getting below error please can someone suggest me to change the retention duration default or this error is related to something else
Copy code
"pyspark.sql.utils.StreamingQueryException: The stream from your Delta table was expecting process data from version 142,\nbut the earliest available version in the _delta_log directory is 159. The files\nin the transaction log may have been deleted due to log cleanup. In order to avoid losing\ndata, we recommend that you restart your stream with a new checkpoint location and to\nincrease your delta.logRetentionDuration setting, if you have explicitly set it below 30\ndays.\nIf you would like to ignore the missed data and continue your stream from where it left\noff, you can set the .option(\"failOnDataLoss\", \"false\") as part\nof your readStream statement.\n=== Streaming Query ===\nIdentifier: JWR-SEC-
g

Gerhard Brueckl

02/08/2023, 7:57 AM
thats expected, you broke the stream by deleting files before they got processed by the stream
r

Rahul Sharma

02/08/2023, 8:57 AM
But my streaming continue running so file should be deleted after processed the data
g

Gerhard Brueckl

02/08/2023, 5:00 PM
a stream only reads and does not delete any files it has already processed. It just keeps track of what has been processed so in case the stream gets aborted, it can restart from where it was stopped previously in your cas the stream was stopped e.g. on Monday, then data up until Wednesday was deleted and then you restarted the stream which now tried to read data for Tuesday which does not exist anymore as it was deleted already
r

Rahul Sharma

02/09/2023, 4:11 AM
is there any way to resume the data from where we have last checkpoint data ?
if i change chkpoint location then i would have starting point data but i want only resume it ?
g

Gerhard Brueckl

02/09/2023, 8:09 AM
if you remove or change the checkpoint location, it will start the stream from the very beginning alternatively you can also specify from where to start your stream when you create you stream using some options https://docs.delta.io/latest/delta-streaming.html#specify-initial-position
5 Views