https://delta.io logo
m

Matthew Powers

02/21/2023, 4:12 PM
Just noticed an interesting delta-rs / delta-spark difference. Delta Spark doesn’t let you instantiate a Delta Table with a specific table version, but delta-rs does. • delta-rs:
DeltaTable("../rust/tests/data/simple_table", version=2)
• delta-spark:
DeltaTable.forPath(spark, "/path/to/table")
- no version argument available Are there any implications of this difference we should think about?
w

Will Jones

02/21/2023, 4:17 PM
Yeah in general delta-spark doesn’t make it easy to time travel on a Delta Table
In some sense it makes sense for that class though, as most of the methods you wouldn’t want to call on anything but the latest version (e.g. merge, optimize, etc.)
Whereas the
DeltaTable
class we have has methods that make sense to call on past versions.
So probably means we don’t want to add operations as methods on
DeltaTable
, and instead keep as separate functions (or maybe another class)
m

Matthew Powers

02/21/2023, 6:23 PM
Yea, that makes sense.
DeltaTable("../rust/tests/data/simple_table", version=2).optimize()
would be weird. I’m just wondering if
DeltaTable("../rust/tests/data/simple_table", version=2)
would ever back us into a corner. Perhaps with something like deletion vectors if they don’t count as a “new version” (I’m not saying that’s the case, just brainstorming out loud).
👍 1
w

Will Jones

02/21/2023, 6:26 PM
Yeah I think the distinction to make is that
DeltaTable
represents a table at some particular time, and not the table in general. The only implication I can think of right now is that we shouldn’t implement operations on the table as methods, but that the methods should just be for extracting information from the log.
Perhaps with something like deletion vectors if they don’t count as a “new version” (I’m not saying that’s the case, just brainstorming out loud).
It should always be sound. Any change to the table, no matter how small, creates a new transaction / log file / version.
m

Matthew Powers

02/21/2023, 6:28 PM
That all makes sense @Will Jones! Thanks for the wisdom as always.
r

rtyler

02/21/2023, 7:33 PM
Yeah I think the distinction to make is that DeltaTable represents a table at some particular time, and not the table in general.
FWIW I think this is a good decision we made in delta-rs and think it's a missed opportunity in delta-spark. (I like OOP but sometimes it encourages what might otherwise be silly decisions)
👍 1
9 Views