Simon Thelin

07/06/2023, 10:20 AM
Does anyone here have experience with running a spark CDF job as a batch job? Essentially, we have a scenario where there will be bursts of data coming in, and we want to process it in a stream fashion via delta CDF. Then when inactivity hits, we shut the job down, until it is triggered again by a user. This feels smoother than a full on batch job, but we also don’t need it to stay alive constantly forever.

Dominique Brezinski

07/06/2023, 4:56 PM
Just use the stream mechanics with
When you start the job it will read the change data feed from the last check-pointed position until the end, process the data, and terminate.

S Thelin

07/10/2023, 5:25 AM
Would this work for CDF as well @Dominique Brezinski? I thought that only worked with traditional checkpoints. Was trying to read the docs and to me it seems a bit different to write a CDF job compared to a traditional stream job with checkpoint and HWM.