https://delta.io logo
m

Morgan

02/06/2023, 10:59 AM
Hi All, I'm proceeding some tests about log retention and I intentionally put the value of 'logRetentionDuration' at "interval 0 days" and I'm proceeding with append only on the current delta table, but the transaction log files never removed ? Did I missed something ? (Delta 2.1.1)
s

Scott Sandre (Delta Lake)

02/06/2023, 4:57 PM
logRetentionDuration
refers to how long "log" aka .json files are kept in the delta log. However, I can't just write a
5.json
and then immediately clean it up (delete it). We have to wait until we do the next checkpoint (e.g.
10.checkpoint.parquet
) before we can clean up previous logs. So - are you doing enough transactions to perform a checkpoint?
m

Morgan

02/06/2023, 5:22 PM
Yes, I have 86 files and one checkpoint file every 10 json file
here the content of my first log file:
Copy code
{
  "protocol": {
    "minReaderVersion": 1,
    "minWriterVersion": 2
  }
}
{
  "metaData": {
    "id": "9824d1d0-ccac-4cec-a9c1-4bb174179b33",
    "format": {
      "provider": "parquet",
      "options": {}
    },
    "schemaString": "{\"type\":\"struct\",\"fields\":[]}",
    "partitionColumns": [],
    "configuration": {
      "delta.deletedFileRetentionDuration": "interval 0 days",
      "delta.appendOnly": "true",
      "delta.logRetentionDuration": "interval 0 days",
      "delta.dataSkippingNumIndexedCols": "0",
      "delta.checkpointRetentionDuration": "0 days"
    },
    "createdTime": 1675678319032
  }
}
{
  "commitInfo": {
    "timestamp": 1675678319533,
    "operation": "CREATE TABLE",
    "operationParameters": {
      "isManaged": "false",
      "description": null,
      "partitionBy": "[]",
      "properties": "{\"delta.deletedFileRetentionDuration\":\"interval 0 days\",\"delta.appendOnly\":\"true\",\"delta.logRetentionDuration\":\"interval 0 days\",\"delta.dataSkippingNumIndexedCols\":\"0\",\"delta.checkpointRetentionDuration\":\"0 days\"}"
    },
    "isolationLevel": "Serializable",
    "isBlindAppend": true,
    "operationMetrics": {},
    "engineInfo": "Apache-Spark/3.3.1 Delta-Lake/2.1.1",
    "txnId": "86d2a833-d470-4d11-b4dd-1cdeac1be011"
  }
}
s

Scott Sandre (Delta Lake)

02/07/2023, 5:07 PM
which environment are you running this on? locally? EMR? think you could put together a reproducible example and create an issue at https://github.com/delta-io/delta ?
m

Morgan

03/13/2023, 7:38 AM
Hi Scott, I did not have time to work on since now, I left the problem aside because I had no problem in production with Spark (no local mode). But this week-end I tried to test some stuff about Delta retention with docker image that run spark as local mode by default. I was again confronted with the same problem, no cleaning process seems to be executed when checkpoint file were created. I use the same code application in both case, but cleaning seems to never occurs in local mode.
5 Views