Situation: We have the need to load in empty tables without providing a schema at first, before another pipeline actually kicks off to write data to this location (using the delta format of course).
For this to work we need to set the spark.conf.set(“Spark.Databricks.delta.schema.autoMerge”,”true”) so that schema evolution is enabled when the pipeline writes data to this empty location that has been created from step one.
However, upon writing data to the empty location in step two, I am still getting an error saying that I need to set the save mode as overwrite and also need to specify option(overwriteSchema) to true.
Is there no way that these two above options can be set at the cluster level so that we don’t affect our current delta writer which is being used by hundreds of other pipelines?
Is my approach correct?