https://delta.io logo
g

Gal Stainfeld

06/01/2023, 11:40 AM
Hi guys. Wanted to ask if someone has a good way to know if a delta table is z-ordered by some fields ? I saw i can use the history and filter on
OPTIMIZE
operation with
operationParameters.zOrderBy
. The above method worked for me locally after i z-ordered a table, but i was trying to use it on a table i know for sure was z-ordered in the past (half a year ago) and it returned empty results. Is it possible for a table to just lose it’s z-ordered when the only operations made on it are add and update ones ? Thanks
g

Gerhard Brueckl

06/01/2023, 12:10 PM
technically, the Z-Ordering happens on the parquet file(s) and I do not think it stores metadata whether it is Z-ordered or not you might find some hints in the delta log (as you already mentioned) but thats of course not reliable as the history could be cleaned/vacuumed
g

Gal Stainfeld

06/01/2023, 12:40 PM
Thanks @Gerhard Brueckl. Can i check for sure if a table was cleaned or vacuumed? Any other way to verify if table is z-ordered? since it’s an expensive operation i don’t want to do it if not needed. Also just to verify - when one Z-ordered a table - does he needs to keep Z-ordering it again with every merge (update\write) or doing it once is enough ? Thanks again !
g

Gerhard Brueckl

06/01/2023, 12:42 PM
If I remember correctly, Delta does not re-optimize an already optimized table/partition - this should also apply to Z-Ordering then but then it must be somewhere stored whether a table/partition is z-ordered or not 🤔
n

nixent

06/01/2023, 12:47 PM
You can fetch z-order in operation columnt from history of delta, try running
describe history
g

Gerhard Brueckl

06/01/2023, 12:51 PM
sure, if the delta history is present 🙂 but thats not necessarily the case
g

Gal Stainfeld

06/01/2023, 1:13 PM
@Gerhard Brueckl As far as i understand Delta prevents autoCompaction + Z-ordering so i guess that the Z-order index is maintained through the lifetime of the table, e.g preserved with every merge operation. So was i wrong about it? Do one need to keep Z-order the table with every merge operation made on the table in order to preserve this ordering state? @nixent I tried it, In my case it looks like i don’t have the history from the time it was z-ordered.
g

Gerhard Brueckl

06/01/2023, 1:15 PM
yes, autoCompaction does not support Z-ordering (you did not mention you are talking about autoCompaction before)
as I said, the Z-ordering is persisted with the parquet file. if your update statement removes the z-ordered parquet file from the latest version of your table, and adds the data into a new parquet file, you would need to run OPTIMIZE/Z-Ordering again
🙏 1
g

Gal Stainfeld

06/01/2023, 1:20 PM
Maybe a stupid follow up question but i want to make sure i understand fully - how do i know if the update statement removed the z-ordered parquet file from the latest version of the table or not ?
g

Gerhard Brueckl

06/01/2023, 1:22 PM
you dont and you also should not need to worry about this you can just re-run the optimize operation. Delta will decide if OPTIMIZE/ZOrdering needs to be reapplied
1
👍 1
g

Gal Stainfeld

07/20/2023, 6:51 AM
@Gerhard Brueckl Thanks again for your answer. I would like to ask a small follow up question if i may. We know built a maintenance mechanism responsible also for z-ordering, compacting and vacuuming a delta table. When doing all the above 3, what would be the ideal order between them and why?
g

Gerhard Brueckl

07/20/2023, 12:00 PM
Z-Ordering/Compacting followed by Vacuum but it does not matter too much as vacuum will hard-delete the files that were soft-deleted some time ago so the operations are not really conflicting if you have a longer retention period configured (default 7 days)
g

Gal Stainfeld

07/20/2023, 1:00 PM
Ok Thanks. I have done that on a table that is the source of a streaming process. Would this cause the streaming to have to read the entire table files all over again when the stream is re-initialized?
g

Gerhard Brueckl

07/20/2023, 1:04 PM
Optimized files should not be read again by your stream if thats the question. If you initialize a new stream, it will only read the latest version
g

Gal Stainfeld

07/20/2023, 1:06 PM
It may be related to the fact that the table was (also) re-partitioned then?
g

Gerhard Brueckl

07/20/2023, 1:08 PM
If you repartitioned it you had to completely rewrite it? No?