https://delta.io logo
g

Godel Kurt

03/23/2023, 2:40 AM
Hi folks. I ran zorder and compact on a file of 1MB, but found that the job processed 10k partitions in one of the stages. Is there any way to configure the number of partitions at this point?
k

Kashyap Bhatt

03/23/2023, 3:01 AM
Not quite sure what you mean, but if you just ran
OPTMIZE my_table
then it'll do it for whole table. You can narrow down with a where clause to only select data that changed.
Copy code
OPTIMIZE events WHERE partition_date_column = '2021-11-18' ZORDER BY (eventType)
https://docs.delta.io/latest/optimizations-oss.html#z-ordering-multi-dimensional-clustering
g

Godel Kurt

03/23/2023, 4:01 AM
Thanks. I meant the job of zordering had multiple stages, and one of them implemented 10k tasks (one task/partition). I observed that the number 10k also appears in the zordering of larger datasets. Don't know how to reduce such a big number of partitions.
k

Kashyap Bhatt

03/23/2023, 2:23 PM
I ran zorder and compact on a file of 1MB
This is the part that's unclear to me. When you run OPTIMIZE, you just provide name of table and optionally some filters to select the data you want to optimize. And then delta decides which files need to be touched. YOU don't tell delta
OPTIMIZE <this-specific-file>
. Perhaps posting the command you executed would help.
2 Views