Also want to know what will happen if we don’t have checkpoints?
02/02/2023, 10:54 PM
The stream will start at the beginning of the Source’s offset/history/version and output to the Sink will potentially be duplicative of prior output. Put another way there will be no guarantees of exactly-once semantics from Source to Sink.
Assume we always run production jobs with checkpoints. If for some reason we have to re-start the stream with a new checkpoint, we would try to figure out the last successfully checkpointed Source offset, and start the stream with a corresponding version/timestamp/offset to minimize duplicate data.