https://delta.io logo
a

Ans Fida

01/27/2023, 6:53 PM
Question about Delta table versioning and time travel: If a Delta table has two versions v1 and v2, does v2 contain everything that v1 had plus the difference between d1 and d2 OR does v2 only contain the difference between v1 and v2 ?
c

Chris

01/27/2023, 6:53 PM
it’s a rolling set of changes.. so cumulative
w

Will Jones

01/27/2023, 6:57 PM
It’s a little more complicated than that (for example, the answer is different depending on whether V2 was created from an Append transaction or a Delete transaction). What are you trying to understand? How much storage each version should take up? Or something else?
a

Ans Fida

01/27/2023, 6:59 PM
I have a use case where I’d like to read data from Delta table via Delta connector for a streaming application. To add job resilience and restarting from failures, I want to use the startingTimestamp to continue job processing only from the point where the job had failed and process NEW data from that point onwards. Any ideas on how I can do that? P.S My delta table is a collection of log files in a S3 bucket, there won’t be any updates or deletion to logs files once uploaded, but there will be new log files periodically.
n

Nick Karpov

01/27/2023, 7:19 PM
ya as long as your source delta table is append only, you can stream using the timestamp... i'd prefer to use startingVersion though b/c it's more explicit
if your source table is subject to update/deletes, you can still accomplish this by enabling change data feed on the table and reading the change feed
a

Ans Fida

01/27/2023, 7:21 PM
Do I have to specify in some configuration that the source delta table is append?
Yeah, I’d eventually use
startingVersion
as it’s cleaner, and according to my understanding the
startingTimestamp
just ends up mapping to a version anyways, so using
startingVersion
helps avoiding confusions.
n

Nick Karpov

01/27/2023, 7:23 PM
Do I have to specify in some configuration that the source delta table is append?
no, the stream will throw an exception if you do perform an update/delete https://docs.delta.io/2.2.0/delta-streaming.html#ignore-updates-and-deletes
👍 1
and ya i like version better for that same reason, using timestamps makes me nervous 😬
a

Ans Fida

01/27/2023, 7:35 PM
I think I understand what ignore deletes and updates is doing but the naming is a little confusing. Intuitively, I think that ignore updates should basically result in no change being propagated to the downstream in case of an update. Here it rewrites the file containing that change and propagates that entire file as a change to downstream.
y

Yousry Mohamed

01/27/2023, 7:57 PM
Do I have to specify in some configuration that the source delta table is append?
Yes, for streaming you don’t need to modify anything on the streaming table. But to protect the table from any accidental updates or deletes you can use table property
delta.appendOnly
💡 1
👍 1
3 Views