Álvaro José Baranoski
02/06/2023, 2:33 PMspark-submit
.
The table that I'm trying to create is the very first one in the getting started section, like so:
data = spark.range(0, 5)
data.write.format("delta").save("/tmp/delta-table")
Whoever, when executing the command, the log presents the following error:
[2023-02-06, 11:09:36 -03] {spark_submit.py:495} INFO - 23/02/06 11:09:36 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 2) (172.19.0.3 executor 2): java.io.FileNotFoundException:
[2023-02-06, 11:09:36 -03] {spark_submit.py:495} INFO - File file:/home/alvaro/airflow/tmp/delta-table/_delta_log/00000000000000000000.json does not exist
[2023-02-06, 11:09:36 -03] {spark_submit.py:495} INFO -
[2023-02-06, 11:09:36 -03] {spark_submit.py:495} INFO - It is possible the underlying files have been updated. You can explicitly invalidate
[2023-02-06, 11:09:36 -03] {spark_submit.py:495} INFO - the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by
[2023-02-06, 11:09:36 -03] {spark_submit.py:495} INFO - recreating the Dataset/DataFrame involved.
What should I do to get the delta table to be created on my local machine? Is it possible to do so using this kind of Spark cluster? If not, what is the best way to Spark + Delta Lake + Airflow on my local machine?
Thanks in advance!!Sherlock Beard
02/06/2023, 2:37 PMÁlvaro José Baranoski
02/06/2023, 2:38 PMSherlock Beard
02/06/2023, 4:21 PMÁlvaro José Baranoski
02/06/2023, 5:03 PM