https://delta.io logo
g

Godel Kurt

03/09/2023, 9:29 AM
Hi everyone. Does zordering by specific columns a data into only one or two files, each around 1GB, speed up the query? Delta selects the files using the metadata from zorder, but I don't know if the performance improves if there's only one file.
y

Yousry Mohamed

03/09/2023, 9:44 AM
Z-Ordering is a technique that helps with data skipping which in case of delta means the column-level stats are used to completely skip certain parquet files because that don’t satisfy the criteria of a query. Going to the extreme of two or even one file means Z-Ordering is completely useless. If you have 8 files for example and your query has its data in a single parquet file, then query will be much faster because it reads 1/8th of total data volume. Have a look on this post for some ideas. https://yousry.medium.com/delta-lake-z-ordering-from-a-to-z-315063a42031
👍 1
j

Jim Hibbard

03/26/2023, 8:36 AM
Nick Karpov recently made a great post on Z-ordering as well. Definitely worth a read if you want a good intuition of when the feature is useful. https://www.linkedin.com/feed/update/urn:li:activity:7042560428870156288/
🎯 1
2 Views