Naama Gal-Or
01/17/2023, 8:34 AMdeltaTable.optimize().executeZOrderBy
to sort records within each of the Parquet files according to the sort column? At the moment it seems like the data within each file is not sorted by the sort column and this effects our file sizes as similar records are not located next to each otherJosephK (exDatabricks)
01/17/2023, 12:32 PMNaama Gal-Or
01/17/2023, 12:34 PMNaama Gal-Or
01/17/2023, 5:59 PMrepartitionByRange
so no sorting option if I understand correctly. https://github.com/delta-io/delta/blob/master/core/src/main/scala/org/apache/spark/sql/delta/skipping/MultiDimClustering.scala#L50