https://delta.io logo
s

Shay Palachy-Affek

06/29/2023, 12:25 PM
Hey! Nice to be here. 🙂 I have a question with which I’d love your help: We (a data analytics and science team) are starting to manage our data on Databricks using the medallion architecture. We want to start pulling in data (mainly from MongoDB collections) into DeltaTables, with
delta.enableChangeDataFeed = true
enabled at the silver level, retaining change history. Now, obviously we will have per-column (for the chosen columns) change history from this point onwards. However, for select, very import, columns we do have a way of reconstructing all past value changes from raw data kept at some third data store location (some object storage). We would like to utilize this to get all past changes into said DeltaTables as if the tables were created a few years ago (instead of right now) and were fed all these changes over time, so we can - from this point forwards - get both old and new changes from a single source and using one API. Is this possible? Can we directly inject history into the CDF (Change Data Feed) of a table? The CDF having a schema with a
_commit_timestamp
field points to this possibility (if we can just write rows into the CDF directly). Alternatively, can we write changes to table while overriding the timestamp of the change time instead of it being automatically
now()
? If we can feed the change history from the past up to the present we will end up with the same result. Thank you for reading this, Shay