nikhil raman

03/17/2023, 7:34 PM
Hi All, I am new to delta lake and I am trying to upsert data to a delta table.The scenario is as below: target.alias("target").merge( source.alias("source") , "target.partition_column in (partition_value) and = .whenMatchedUpdateAll() .whenNotMatchedInsertAll() .execute()) A very simple partition pruning followed by upsert. The target is partitioned into two partitions. Now my merge operation is creating only one file in the partition after write. And in the spark ui all the data is shuffled into one executor which is then doing all the writing, which is not very efficient. Can someone please help me on this I am using OSS delta lake.
Setting the following config resolved the issue:
Copy code
spark.conf.set("", "false")
👍 2