Amit Singh Hora
06/07/2023, 5:17 AM./pyspark --packages io.delta:delta-core_2.12:2.4.0 \
--conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
--conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" \
--conf "fs.s3a.aws.credentials.provide=com.amazonaws.auth.DefaultAWSCredentialsProviderChain" \
--conf "spark.sql.hive.metastore.version=3.1.3" \
--conf "spark.sql.hive.metastore.jars=maven"
This is my spark configuration -
spark = SparkSession.builder \
.appName("DeltaTableExample") \
.master("local[*]") \
.config("spark.hadoop.javax.jdo.option.ConnectionURL", "jdbc:<postgresql://localhost:5435/hive_metastore2>") \
.config("spark.hadoop.javax.jdo.option.ConnectionDriverName", "org.postgresql.Driver") \
.config("spark.hadoop.javax.jdo.option.ConnectionUserName", "username") \
.config("spark.hadoop.javax.jdo.option.ConnectionPassword", "password") \
.config("spark.sql.warehouse.dir","<s3a://location/hivewarehouse>") \
.config("spark.sql.extensions","io.delta.sql.DeltaSparkSessionExtension") \
.config("spark.sql.catalog.spark_catalog","org.apache.spark.sql.delta.catalog.DeltaCatalog") \
.enableHiveSupport() \
.getOrCreate()
I am able to write the delta table to S3, but the moment I try to create the table with spark sql to have it’s entry available in Hive metastore
delta_table_path = "<s3a://location/deltatable>"
# Register Delta table in Hive Metastore
spark.sql(f"CREATE TABLE IF NOT EXISTS my_table USING DELTA LOCATION '{delta_table_path}'")
i start getting these errors - I don’t understand from where it is picking this /user/hive/warehouse/my_table , hive warehouse location
23/06/07 00:47:57 WARN HiveExternalCatalog: Couldn't find corresponding Hive SerDe for data source provider delta. Persisting data source table `spark_catalog`.`default`.`my_table` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.
00:48:01.034 [Thread-3] ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler - MetaException(message:file:/user/hive/warehouse/my_table-__PLACEHOLDER__ is not a directory or unable to create one)
Please note I am running hive metastore standalone and hive meta init worked without any erros.Matthew Powers
06/07/2023, 8:57 AM