https://delta.io logo
m

Mohan

08/04/2023, 6:28 PM
Hi All, Has anyone tried Z-ORDER ing based on concatenated cols of a delta table. Example : if you have a table with the following columns: Order_id Order_dt Order_total And you want to Z-order the table by Order_id and Order_dt using the following syntax: ZORDER BY Order_id + Order_dt. what will be the performance implications ?
d

Dominique Brezinski

08/04/2023, 7:50 PM
You know you can z-order multiple columns, right?
m

Mohan

08/04/2023, 8:26 PM
yes I do know, but wanted to know how will be the performance with more than cols. have not tried.
I mean diff between 2 ( Order_dt,Order_id) cols vs concatenated cols ( Order_dt + Order_id).
d

Dominique Brezinski

08/04/2023, 8:29 PM
There is a decreasing return with the number of column included. 3 columns is a reasonable limit. You want them to be high cardinality columns, so including a date column for example is not the best usually, but if you can run a few experiments on your data and query patterns that is the best thing to do.
m

Mohan

08/04/2023, 8:37 PM
I got it. the combination ZORDER BY Order_id + Order_dt has high cardinality. will his be an option to try out. Pls let me know your thoughts
d

Dominique Brezinski

08/04/2023, 9:06 PM
The syntax is
OPTIMIZE table_name ZORDER BY (Order_id, Order_dt)
That does the order over both columns. It is not a hierarchical sort, so it is essentially clustering by both column values across the curve--values of both columns are parameters to the continuous function.
m

Mohan

08/04/2023, 9:11 PM
Thanks @Dominique Brezinski