https://delta.io logo
k

Kashyap Bhatt

01/30/2023, 3:22 PM
Hello! Where do I find the correct versions of various modules to pass to
--packages
when running pyspark locally? E.g.
delta-core_2.12:2.2.0
works fine:
Copy code
pyspark --packages io.delta:delta-core_2.12:2.2.0,org.apache.hadoop:hadoop-aws:3.3.4 \
        --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
        --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
But
delta-core_2.12:2.1.0
throws a
NullPointerException
at start, which I assume is due to some version incompatibility:
Copy code
pyspark --packages io.delta:delta-core_2.12:2.1.0,org.apache.hadoop:hadoop-aws:3.3.4 \
        --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
        --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
Question is where do I find the compatible versions of
delta-core
,
io.delta
and
hadoop
? My source was: https://docs.databricks.com/release-notes/runtime/12.1.html ( I use Databricks Runtime), but I do not see delta-core there, so when I change from 2.2.0 to 2.1.0 I don't know what to change 2.12 to?
1
NullPointerException
full log if it matters.
s

Scott Sandre (Delta Lake)

01/30/2023, 4:53 PM
but I do not see delta-core there, so when I change from 2.2.0 to 2.1.0 I don't know what to change 2.12 to?
2.12 is the scala version, you don't need to change it
on which environment are you running this?
i.e. EMR?
k

Kashyap Bhatt

01/30/2023, 4:55 PM
On an Ubuntu VM running on my local laptop (Windows Host, though it doesn't matter I guess).
I refer to dbk page because the code I'm testing/running on my laptop is finally going to be deployed to databricks, so I'm trying to ensure that the versions of various modules (delta, pandas, hadoop, ...) in my local match the ones in the databricks runtime I plan to run against in production.
2.12 is the scala version, you don't need to change it
@Scott Sandre (Delta Lake), just noticed this. So is there something else that causes that NPE when I use 2.1.0? I attached the full stdout here. If I just change 2.1.0 to 2.2.0 to it works, I can provide stdout if needed. Other way to describe the problem is: • When I try to start a pyspark session with delta
2.2.0
and hadoop 3.3.4 (which correspond to Databricks Runtime 12.1) all is well. • When I try to start a pyspark session with delta
2.1.0
and hadoop 3.3.4 (which correspond to Databricks Runtime 11.3) I get NPE.
s

Scott Sandre (Delta Lake)

01/30/2023, 9:41 PM
Yup I understand the problem, but the logs don't show any reason as to the cause of the NPE 😕
g

Grainne B

01/31/2023, 5:04 AM
I am getting this exact same error with 2.2.0! •
io.delta:delta-core_2.12:2.2.0
org.apache.hadoop:hadoop-aws:3.3.4
5 Views