https://delta.io logo
j

Joydeep Banik Roy

05/05/2023, 9:59 AM
g

Gerhard Brueckl

05/05/2023, 10:11 AM
not sure but wouldn't you want to also add
source.country = 'USA'
then to your merge condition?
Copy code
<http://deltaTable.as|deltaTable.as>("target")
  .merge(<http://df.as|df.as>("source"), 
  //Earlier would have looked like "target.id = source.id and target.country = source.country"
  "target.id = source.id and target.country = 'USA'")
  .whenMatched
  .updateAll()
  .whenNotMatched()
  .insertAll()
  .execute()
j

Joydeep Banik Roy

05/05/2023, 10:39 AM
True, sorry I missed that => also, in our case source was already partitioned by country, and we picked them separately but that might not be immediately apparent to the reader, let me add that part
thanks Gerhard
g

Gerhard Brueckl

05/05/2023, 11:09 AM
also, how would you parallelize the load then as there is only 1 source_df?
j

Joydeep Banik Roy

05/05/2023, 12:44 PM
maybe I am understanding the question wrong so we would launch multiple spark delta jobs with the same dataframe, is this what you are asking
3 Views