Robin Moffatt
05/12/2023, 3:06 PMorg.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 14.0 failed 1 times, most recent failure: Lost task 2.0 in stage 14.0 (TID 66) (1e1dc78cf259 executor driver): java.io.FileNotFoundException:
No such file or directory: <s3a://example/data_load/raw/soil/_delta_log/00000000000000000001.json>
It is possible the underlying files have been updated. You can explicitly invalidate
the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by
recreating the Dataset/DataFrame involved.
❓ So, on to my question:
I've found that if I restart the Jupyter kernel I can re-query the delta table successfully.
However, I also thought I'd be able to re-create the spark session:
spark.stop()
spark = SparkSession.builder.appName […]
to the same effect - but doing this I still get the FileNotFound
exception, suggesting that something's cached somewhere.
I've tried the suggestion in the error message, but this also throws the FileNotFound exception:
refresh table <delta.s3a://example/data_load/my/table>
Is there a way to programatically flush whatever's been cached, without the "turn it off and turn it back on again" approach of restarting the kernel?Sumanth Bo3
05/13/2023, 7:07 PMspark.catalog.clearCache()
i remember reading something like to this but don’t have the actual source try this outRobin Moffatt
05/18/2023, 3:27 PM