Hi All, I am new to delta lake and I am trying to upsert data to a delta table.The scenario is as below:
target.alias("target").merge(
source.alias("source") , "target.partition_column in (partition_value) and target.id = source.id)
.whenMatchedUpdateAll()
.whenNotMatchedInsertAll()
.execute())
A very simple partition pruning followed by upsert. The target is partitioned into two partitions. Now my merge operation is creating only one file in the partition after write. And in the spark ui all the data is shuffled into one executor which is then doing all the writing, which is not very efficient. Can someone please help me on this I am using OSS delta lake.