Anjaneya Alluri

02/21/2023, 1:43 AM
Hi team, we are in the process to upgrade delta 2.2.0, With our current environment in delta 1.0.0, here is our scenario 1). set a default 'deletedFileRetentionDuration' on the delta table of 7 days 2) append a delta file and lets say it's version 5, 3) file deleted on day three from the time it was appended. 4) run vacuum on day ten. Observation: On day 11, we observe that the spark action 'count; on version 5 of the delta table used to fail with delta 1.0.0, But with version 2.2.0 it has been succeeding. In delta 2.2.0, only when we use other spark actions such as "collect" or "show" we are seeing the file not found spark exception. Anyone else seen this behavior as a side-effect in upgrading to delta 2.2.0?

Dominique Brezinski

02/21/2023, 4:01 AM
I believe that is because straight counts are a metadata operation in 2.2.0 but they were not in 1.0.0. So the metadata for version 5 still exists, because the default is 30 days, but the data file does not because of your settings and the vacuum.