Kashyap Bhatt
01/30/2023, 3:22 PM--packages
when running pyspark locally?
E.g. delta-core_2.12:2.2.0
works fine:
pyspark --packages io.delta:delta-core_2.12:2.2.0,org.apache.hadoop:hadoop-aws:3.3.4 \
--conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
--conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
But delta-core_2.12:2.1.0
throws a NullPointerException
at start, which I assume is due to some version incompatibility:
pyspark --packages io.delta:delta-core_2.12:2.1.0,org.apache.hadoop:hadoop-aws:3.3.4 \
--conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
--conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
Question is where do I find the compatible versions of delta-core
, io.delta
and hadoop
?
My source was: https://docs.databricks.com/release-notes/runtime/12.1.html ( I use Databricks Runtime), but I do not see delta-core there, so when I change from 2.2.0 to 2.1.0 I don't know what to change 2.12 to?NullPointerException
full log if it matters.Scott Sandre (Delta Lake)
01/30/2023, 4:53 PMbut I do not see delta-core there, so when I change from 2.2.0 to 2.1.0 I don't know what to change 2.12 to?2.12 is the scala version, you don't need to change it
Kashyap Bhatt
01/30/2023, 4:55 PM2.12 is the scala version, you don't need to change it@Scott Sandre (Delta Lake), just noticed this. So is there something else that causes that NPE when I use 2.1.0? I attached the full stdout here. If I just change 2.1.0 to 2.2.0 to it works, I can provide stdout if needed. Other way to describe the problem is: • When I try to start a pyspark session with delta
2.2.0
and hadoop 3.3.4 (which correspond to Databricks Runtime 12.1) all is well.
• When I try to start a pyspark session with delta 2.1.0
and hadoop 3.3.4 (which correspond to Databricks Runtime 11.3) I get NPE.Scott Sandre (Delta Lake)
01/30/2023, 9:41 PMGrainne B
01/31/2023, 5:04 AMio.delta:delta-core_2.12:2.2.0
• org.apache.hadoop:hadoop-aws:3.3.4