Hi everyone. Does zordering by specific columns a data into only one or two files, each around 1GB, speed up the query? Delta selects the files using the metadata from zorder, but I don't know if the performance improves if there's only one file.
y
Yousry Mohamed
03/09/2023, 9:44 AM
Z-Ordering is a technique that helps with data skipping which in case of delta means the column-level stats are used to completely skip certain parquet files because that don’t satisfy the criteria of a query. Going to the extreme of two or even one file means Z-Ordering is completely useless.
If you have 8 files for example and your query has its data in a single parquet file, then query will be much faster because it reads 1/8th of total data volume. Have a look on this post for some ideas. https://yousry.medium.com/delta-lake-z-ordering-from-a-to-z-315063a42031