https://delta.io logo
n

Nitya Thacker

04/13/2023, 9:07 PM
I am trying to execute zorder on a delta table and running into an error. Any clues?
Copy code
Error - org.apache.spark.SparkUnsupportedOperationException: Cannot evaluate expression: rangepartitionid(input[14, string, true], 1000)
Versions - delta 2.1.0, spark version '3.3.0+amzn.1.dev0'
errors at - mytable.optimize().where("parititionfield=somevalue").executeZOrderBy("col1", "col2")
m

Matthew Powers

04/13/2023, 9:31 PM
What’s your code? What Delta version are you using?
n

Nitya Thacker

04/13/2023, 9:42 PM
@Matthew Powers as mentioned in the post, the Delta version is 2.1.0 (OSS). For code, I am running the command mytable.optimize().where("parititionfield=somevalue").executeZOrderBy("col1", "col2") from my Zepplin notebook, running on an AWS EMR cluster
m

Matthew Powers

04/13/2023, 9:42 PM
What’s the error message?
n

Nitya Thacker

04/13/2023, 9:43 PM
org.apache.spark.SparkUnsupportedOperationException: Cannot evaluate expression: rangepartitionid(input[14, string, true], 1000)
t

TD

04/13/2023, 9:53 PM
huh! does it work locally with locally installed apache spark 3.3?
n

Nitya Thacker

04/13/2023, 9:55 PM
the table is huge, sitting on s3, so I haven't even tried locally, keep running into OOM 😞
Copy code
{numFiles -> 18162, numOutputRows -> 3306081997, numOutputBytes -> 2024184762876}
@TD tried with a small sample locally, spark 3.3.0, delta 2.1.0 and get the same error
tried using 1 column instead of 2, same error
@TD it appears that it does not like the column i m trying to use for zordering. "input[14, string, true]" is for the 14th column. are there any restrictions on the columns that could be used for zordering? Just to test, I tried another int column and get a diff error ( will be ineffective, because we currently do not collect stats for these columns.)
t

TD

04/18/2023, 8:42 PM
aargh. this is getting very complex, too complex for a slack thread that can get deleted any time. can you make github issue, please!
n

Nitya Thacker

04/28/2023, 7:54 PM
done
a

abhijeet_naib

05/23/2023, 2:09 PM
@TD @Nitya Thacker Were we able to resolve this issue ? having a similar issue
n

Nitya Thacker

05/23/2023, 2:31 PM
No, could not resolve. i have created a git issue for this - https://github.com/delta-io/delta/issues/1726
t

TD

05/23/2023, 2:33 PM
@Nitya Thacker did you try what was suggested in the ticket? did that not work?
a

abhijeet_naib

05/23/2023, 2:36 PM
Facing the same issue with 2.3.0 as well , ./bin/spark-shell --packages io.deltadelta core 2.122.3.0
n

Nitya Thacker

05/23/2023, 2:36 PM
i had those configs set up already...with an integer column I got a diff error - will be ineffective, because we currently do not collect stats for these columns
t

TD

05/23/2023, 3:10 PM
That is definitely a different error.. because the reason for that is that the integer column you specific is probably not in the first 32 columns on which we collect stats.
n

Nitya Thacker

05/23/2023, 3:24 PM
if i have struct type columns, are child columns part of the 32 count?
v

venki

05/23/2023, 4:20 PM
Hi @Nitya Thacker, is it possible to create a repro on a small table that you can share to debug?
n

Nitya Thacker

05/23/2023, 4:22 PM
the github issue has some code that reproduces that error
v

venki

05/23/2023, 4:22 PM
awesome, taking a look at it.
10 Views