https://delta.io logo
m

Mohammad Mohtashim Khan

02/24/2023, 6:29 PM
Hello, I have the following Pyspark streaming code in my forEachBatch function when writing micro-dataframe to delta lake:
# create or delete operation
delta_lake_table.alias("main_table").merge(
latest_changes_delete_create_df.alias(
"update_table"), merge_condition
).whenMatchedDelete(condition="update_table.op = 'd'").whenNotMatchedInsertAll(
condition="update_table.op = 'c' OR update_table.op = 'r'"
).execute()
# update operation
delta_lake_table.alias("main_table").merge(
latest_changes_update_df.alias("update_table"), merge_condition
).whenMatchedUpdateAll(
condition="update_table.op = 'u' OR update_table.op = 'r'"
).execute()
Now, the thing is the data schema can evolve, therefore I am using whenMatchedUpdateAll and whenNotMatchedInsertAll while also setting the following config in spark configurations: 'spark.databricks.delta.schema.autoMerge.enabled = true'. However, when a new column comes in I am facing the following error:
pyspark.sql.utils.AnalysisException: The schema of your Delta table has changed in an incompatible way since your DataFrame
or DeltaTable object was created. Please redefine your DataFrame or DeltaTable object.
Changes:
Latest schema has additional field(s): op
Can someone please help and tell me what I am doing wrong. I am using delta lake 2.2.0 (Open-Source) on top of Minio. And have the followed the following documentation: https://docs.delta.io/latest/delta-update.html#automatic-schema-evolution . Thank you.
p

PAWAN SHUKLA

02/26/2023, 10:09 PM
Please look the below link:
g

Gerhard Brueckl

02/27/2023, 11:05 AM
merge does not work with schemaEvolution what I do in that case is I compare the schema of source and (merge-)target and if it is different, I do an append where("1=0") first to update the schema afterwards run the merge
m

Mohammad Mohtashim Khan

02/27/2023, 11:07 AM
okay. Thank you but then this is a bug because according to the documentation it should work.
p

PAWAN SHUKLA

02/27/2023, 12:44 PM
Can you please check the value of new column?
3 Views