https://delta.io logo
c

Cam Gallucci

06/05/2023, 5:51 PM
Hey folks, I am pulling data using the delta-sharing protocol and python SDK. I have a question regarding how the Change Data Feed is handled. Consider the following scenario: 1. I load the whole table to the current version, say version 5, with the
delta_sharing.load_as_spark(table_url)
function. 2. An update to record x occurs in version 6 3. Another update to record x occurs in version 7 4. I load the data from version 5 to 7 to get the most recent updates with
delta_sharing.load_table_changes_as_spark(table_url, starting_version=5, ending_version=7)
Will step 4 give me one combined update for record x (with the
update_preimage
from version 5), or will it have both updates (two
update_preimage
, one from version 5 and one from version 6)? And then say in step 3 record x was deleted instead of updated, would only the delete come? Appreciate any clarification on this, thanks!
g

Gerhard Brueckl

06/05/2023, 6:30 PM
I think it should contain both updates and you need to do a deduplication on your own but better test this on your own :)