https://delta.io logo
h

Hanan Shteingart

01/15/2023, 5:38 PM
I have created a delta table using "AutoLoader" yet when I am trying to look at the data in the "Data" tab it says:
An error occurred while fetching table: dsm09collectx
com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: org.apache.spark.sql.AnalysisException: Incompatible format detected.
A transaction log for Databricks Delta was found at
<s3://nbu-ml/projects/rca/msft/dsm09collectx/delta/_delta_log>
,
but you are trying to read from
<s3://nbu-ml/projects/rca/msft/dsm09collectx/delta>
using format("parquet"). You must use
'format("delta")' when reading and writing to a delta table.
To disable this check, SET spark.databricks.delta.formatCheck.enabled=false
To learn more about Delta, see https://docs.databricks.com/delta/index.html
What is the issue and how can I solve it? when I read the data in Redash I also cannot parse the data. When I read the table using
spark.table(table_name)
it works fine. The code generating the delta table:
Copy code
spark.readStream
  .format("cloudFiles")
  .option("header", "true")
  .option("cloudFiles.partitionColumns", "date, hour")
  .option("cloudFiles.format", "csv")
  .option("cloudFiles.schemaHints", SCHEMA_HINT)
  .option("cloudFiles.schemaLocation", checkpoint_path)
  .option("cloudFiles.schemaEvolutionMode", "addNewColumns")
  .load(file_path)
  .select("*", input_file_name().alias("source_file"), current_timestamp().alias("processing_time"))
  .writeStream
  .option("checkpointLocation", checkpoint_path)
  .option("path", output_path)
  .trigger(availableNow=True)
  .toTable(table_name))
I see the created table is not delta by running
delta.DeltaTable.isDeltaTable(spark, TABLE_NAME)
so I have added format("delta");
Copy code
writeStream
  .format("delta")
  .option("checkpointLocation", checkpoint_path)
  .option("path", output_path)
  .trigger(availableNow=True)
  .toTable(table_name))
But it didn't help (I have VACCUM the table, and dropped it. I have checked the checkpoint and delta path are empty).
y

Yousry Mohamed

01/17/2023, 9:49 AM
It seems you are creating an external table hence dropping the table will not drop the parquet and log files in the table location. Try to start fresh and drop the table plus drop the bucket folders (for the table and checkpoint and schema checkpoint). I see that you you same location
checkpoint_path
for schema checkpoint and streaming checkpoint. They are different things and should live in different folders.
6 Views