https://delta.io logo
m

Maico Timmerman

07/10/2023, 8:40 AM
Hi Folks, we run open source variant of Delta and ran into the problem that a single file has been deleted from our storage layer, while it is still referenced from the DeltaLog. The file is irrecoverable and open source has no
FSCK
functionality. I scouted the docs, but cannot find the recommend approach to solve this. Any tips?
m

Maico Timmerman

07/10/2023, 12:26 PM
That will just silence the error, which I think is a bad practice, in case in the future files go missing. Also does not work for non-spark query engines. I'm looking for a resolution in which I acknowledge to my table that the file is gone. Is there a way to do that? The only way I can think of is
VACUUM
with 0 retention, delete delta log folder and reconvert table to Delta. But that is an intrusive measure for a single missing file.
d

Dominique Brezinski

07/10/2023, 6:03 PM
Agreed. You may be able to do surgery on the delta log using the stand alone or delta-rs interfaces to simply add a remove_file action to the log for the specific file you are missing. That will delete it for all versions going forward of the table.