Prashanth Ramanna09/16/2023, 7:51 PM
Support for disabling Delta checkpointing during commits - For very large tables with millions of files, performing Delta checkpoints can become an expensive overhead during writes. Users can now disable this checkpointing by setting the hadoop configuration propertyI looked into the codebase but the standalone API's do not expose checkpointing capability. So here are my two questions 1. How can I configure another job (preferably using Standalone) to perform checkpointing ? 2. Are there any other recommendations/learnings for handling large delta tables ?to
io.delta.standalone.checkpointing.enabled. This is only safe and suggested to do if another job will periodically perform the checkpointing.
Ashok Krishna09/17/2023, 3:20 PM
Prashanth Ramanna09/17/2023, 7:41 PM
set to false 3. Create a new "DeltaCheckpointer Sevice", which periodically wakes up and performs checkpointing. •
will be set to 1 •
will be set to true • Since Standalone doesn't have API's to commit. I plan to listen to one of the notifications from the worker fleet and use it to perform a commit. I would prefer DeltaCheckpointer would just checkpoint without any integrations with the worker fleet. But I haven't been able to make it work without a valid AddFile in the commit info.