Godel Kurt

03/09/2023, 9:29 AM
Hi everyone. Does zordering by specific columns a data into only one or two files, each around 1GB, speed up the query? Delta selects the files using the metadata from zorder, but I don't know if the performance improves if there's only one file.

Yousry Mohamed

03/09/2023, 9:44 AM
Z-Ordering is a technique that helps with data skipping which in case of delta means the column-level stats are used to completely skip certain parquet files because that don’t satisfy the criteria of a query. Going to the extreme of two or even one file means Z-Ordering is completely useless. If you have 8 files for example and your query has its data in a single parquet file, then query will be much faster because it reads 1/8th of total data volume. Have a look on this post for some ideas.
👍 1

Jim Hibbard

03/26/2023, 8:36 AM
Nick Karpov recently made a great post on Z-ordering as well. Definitely worth a read if you want a good intuition of when the feature is useful.
🎯 1