Jared Grove
06/18/2023, 9:03 PMspark-submit
on my local host I have no errors. However, I wanted to test this spark application in docker, which I guess is still technically my local host, but I want to submit the program from the docker container. I have four containers. spark-master, two spark-workers, spark-history-server, and a spark-driver. All containers are on the same docker network. Inside the spark-driver container is where I launch spark-submit --properties-file ./src/spark/spark-defaults.conf ./src/start_pipeline.py
I receive the following error An error occurred while calling o515.load.
: java.util.concurrent.ExecutionException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 11.0 failed 4 times, most recent failure: Lost task 0.3 in stage 11.0 (TID 156) (172.23.0.3 executor 3): org.apache.spark.SparkFileNotFoundException: File file:/opt/ufo-lakehouse/lakehouse/ufo/bronze/_delta_log/00000000000000000000.json does not exist
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.
This file does exist, the spark program created it! I first thought it may be permission issues so I set all my folders/files with permission 777 but still have the same error. Any help or guidance would be much appreciated. Thank you!Rahul Sharma
06/19/2023, 4:10 AMJared Grove
06/19/2023, 2:09 PMRahul Sharma
06/20/2023, 4:15 AMJared Grove
06/20/2023, 2:38 PM