https://delta.io logo
m

Matthew Powers

05/30/2023, 8:28 PM
Z Ordering is cool, but I don’t think it’s always necessary. A lot of times sorting the data is all you need. Sorting the data is good for tables that are always filtered by a certain column for example. Hierarchical sorting is also good in a lot of instances. I am assuming that sorting would be easier to implement than Z Ordering (also easier to explain and use perhaps). What do folks think about a
delta_table.optimize.sort(["col1", "col2"])
method that hierarchically sorts the data? We can also add a
delta_table.optimize.z_order
method of course.
w

Will Jones

05/30/2023, 8:58 PM
I think I'll go for Z-order first, since it doesn't seem that hard to implement. I wrote the steps here: https://github.com/delta-io/delta-rs/issues/1127#issuecomment-1569091839
However, I do want to eventually explore a pure sort implementation too, since I think we can make that idempotent and early return if the data is already sorted. I think there's good use cases for sorted data we should explore in the long term.
m

Matthew Powers

05/30/2023, 9:09 PM
Sweet, sounds great, thank you