Prashanth Ramanna
09/16/2023, 7:51 PMSupport for disabling Delta checkpointing during commits - For very large tables with millions of files, performing Delta checkpoints can become an expensive overhead during writes. Users can now disable this checkpointing by setting the hadoop configuration propertyI looked into the codebase but the standalone API's do not expose checkpointing capability. So here are my two questions 1. How can I configure another job (preferably using Standalone) to perform checkpointing ? 2. Are there any other recommendations/learnings for handling large delta tables ?toio.delta.standalone.checkpointing.enabled
. This is only safe and suggested to do if another job will periodically perform the checkpointing.false
Ashok Krishna
09/17/2023, 3:20 PMPrashanth Ramanna
09/17/2023, 7:41 PMio.delta.standalone.checkpointing.enabled
set to false
3. Create a new "DeltaCheckpointer Sevice", which periodically wakes up and performs checkpointing.
• delta.checkpointInterval
will be set to 1
• io.delta.standalone.checkpointing.enabled
will be set to true
• Since Standalone doesn't have API's to commit. I plan to listen to one of the notifications from the worker fleet and use it to perform a commit.
I would prefer DeltaCheckpointer would just checkpoint without any integrations with the worker fleet. But I haven't been able to make it work without a valid AddFile in the commit info.