Sekhar Sahu
02/08/2023, 9:58 PM23/02/08 21:53:36 WARN HiveExternalCatalog: Couldn't find corresponding Hive SerDe for data source provider delta. Persisting data source table `default`.`delta_table` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
File "/usr/lib/spark/python/pyspark/sql/session.py", line 1034, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self)
File "/usr/lib/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1322, in __call__
File "/usr/lib/spark/python/pyspark/sql/utils.py", line 196, in deco
raise converted from None
pyspark.sql.utils.IllegalArgumentException: Can not create a Path from an empty string
Code:
## Create a DataFrame
data = spark.createDataFrame([("100", "2015-01-01", "2015-01-01T13:51:39.340396Z"),
("101", "2015-01-01", "2015-01-01T12:14:58.597216Z"),
("102", "2015-01-01", "2015-01-01T13:51:40.417052Z"),
("103", "2015-01-01", "2015-01-01T13:51:40.519832Z")],
["id", "creation_date", "last_update_time"])
## Write a DataFrame as a Delta Lake dataset to the S3 location
spark.sql("""CREATE TABLE IF NOT EXISTS delta_table (id string, creation_date string,
last_update_time string)
USING delta location
'<s3://DOC-EXAMPLE-BUCKET/example-prefix/db/delta_table>'""");
data.writeTo("delta_table").append()
pyspark command used
pyspark --master yarn --deploy-mode client --repositories <http://repo.hortonworks.com/content/groups/public/,https://repos.spark-packages.org/,https://oss.sonatype.org/content/repositories/snapshots> --conf spark.sql.adaptive.coalescePartitions.initialPartitionNum=5000 --conf spark.databricks.delta.optimize.maxFileSize=250000 --conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf spark.executor.extraJavaOptions=-XX:+UseG1GC --conf spark.driver.maxResultSize=0 --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog --conf spark.databricks.delta.optimize.repartition.enabled=true --conf spark.databricks.delta.autoOptimize=true --packages io.delta:delta-core_2.12:2.1.0
Grainne B
04/06/2023, 3:54 AMamazon/aws-glue-libs:glue_libs_4.0.0_image_01
Run container
docker run -it -v ~/.aws:/home/glue_user/.aws -e AWS_PROFILE=saml -e DISABLE_SSL=true --rm -p 4040:4040 -p 18080:18080 --name glue_pyspark amazon/aws-glue-libs:glue_libs_4.0.0_image_01
Create pyspark shell
pyspark --packages org.apache.hadoop:hadoop-aws:3.2.2,io.delta:delta-core_2.12:1.2.1
Code used
spark.sql("create table spark_docker_testing.aws_glue_docker using delta location '<s3://data-testing/test_file>'")
Error received
23/04/06 03:41:34 WARN HiveExternalCatalog: Couldn't find corresponding Hive SerDe for data source provider delta. Persisting data source table `spark_docker_testing`.`aws_glue_docker_delta` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/glue_user/spark/python/pyspark/sql/session.py", line 1034, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self)
File "/home/glue_user/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
File "/home/glue_user/spark/python/pyspark/sql/utils.py", line 196, in deco
raise converted from None
pyspark.sql.utils.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: <s3://data-testingaws_glue_docker_delta-__PLACEHOLDER__>