Jared Grove06/18/2023, 9:03 PM
on my local host I have no errors. However, I wanted to test this spark application in docker, which I guess is still technically my local host, but I want to submit the program from the docker container. I have four containers. spark-master, two spark-workers, spark-history-server, and a spark-driver. All containers are on the same docker network. Inside the spark-driver container is where I launch
I receive the following error
spark-submit --properties-file ./src/spark/spark-defaults.conf ./src/start_pipeline.py
An error occurred while calling o515.load.
: java.util.concurrent.ExecutionException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 11.0 failed 4 times, most recent failure: Lost task 0.3 in stage 11.0 (TID 156) (172.23.0.3 executor 3): org.apache.spark.SparkFileNotFoundException: File file:/opt/ufo-lakehouse/lakehouse/ufo/bronze/_delta_log/00000000000000000000.json does not exist
This file does exist, the spark program created it! I first thought it may be permission issues so I set all my folders/files with permission 777 but still have the same error. Any help or guidance would be much appreciated. Thank you!
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.
Rahul Sharma06/19/2023, 4:10 AM
Jared Grove06/19/2023, 2:09 PM
Rahul Sharma06/20/2023, 4:15 AM
Jared Grove06/20/2023, 2:38 PM