Ajex
01/31/2023, 10:39 AMspark.sql("ALTER TABLE delta.`/user/fplay/temp/testraw/raw_logs.delta` CHANGE COLUMN user_id user_id STRING FIRST")
to change the column index of the column i need to perform z-order to 0(the first column).
After all the newest delta log version still have not any statistic about the column i need.
Any help please!!!Ryan Zhu
01/31/2023, 4:23 PMspark.read.format("delta").load("/user/fplay/temp/testraw/raw_logs.delta").schema
? Is user_id
the first column? In addition, which Delta version are you using? There is a bug that Delta stats will pick up the first 32 columns in the schema of your ingestion data rather than the table schema. 2.2.0 fixed this issue ( https://github.com/delta-io/delta/commit/67bf022d3e4d8fc7c17c12be7875b855283b6996 )Ajex
02/01/2023, 12:10 AMspark.sql("ALTER TABLE delta.`/user/fplay/temp/testraw/raw_logs.delta` CHANGE COLUMN user_id user_id STRING FIRST")
to change the column index of the column i need to perform z-order to 0(the first column).
Then perform z-order, it's working, the delta logs have statistic for "user_id". But when the new data(not run zorder yet) write to that location delta logs have not statistic for "user_id".
Meaning that just the data which i run z-order on it have statistic for "user_id", but the new data writing to /user/fplay/temp/testraw/raw_logs.delta
have not statistic for "user_id"Ryan Zhu
02/01/2023, 3:45 AMAjex
02/01/2023, 4:07 AMspark.read.format("delta").load("/user/fplay/temp/testraw/raw_logs.delta").schema
The above code print out that user_id at the first placeRyan Zhu
02/01/2023, 4:18 AMAjex
02/01/2023, 4:22 AMRyan Zhu
02/01/2023, 4:35 AMAjex
02/01/2023, 4:45 AMRyan Zhu
02/01/2023, 4:46 AMAjex
02/01/2023, 5:00 AM