https://delta.io logo
r

Roberto

02/02/2023, 2:58 PM
Hi All, I have a question related to merge command. Is possible when matched to use the .updateExpr to sum the value for the original table with the new value?
Imagine that I have a table with two columns. Id (String) and total (Long). For example:
Copy code
+-------------------+-----+
|user_id            |total|
+-------------------+-----+
|1                  |177  |
|2                  |147  |
|3                  |146  |
+-------------------+-----+
And I have a DF with these values:
Copy code
+-------------------+-----+
|user_id            |total|
+-------------------+-----+
|1                  |10   |
|2                  |5    |
|3                  |3    |
+-------------------+-----+
Can I use
updateExpre()
to update the value for the Total column sum? The result should be:
Copy code
+-------------------+-----+
|user_id            |total|
+-------------------+-----+
|1                  |187  |
|2                  |142  |
|3                  |149  |
+-------------------+-----+
Is possible to do that? or do I need to do the aggregation in spark and after that re-write the whole table?
h

Harry Metzger

02/02/2023, 3:59 PM
You can pass in any spark function. I do something like this:
Copy code
update_dict = {
        col_name: coalesce(
            col(f'source.{col_name}'),
            col(f'target.{col_name}'),
               lit(None),
            )
            for col_name in df.columns
        }
and then pass that dict to the merge call
r

Roberto

02/02/2023, 4:38 PM
ok, I use update instead of updateExpr
Copy code
<http://deltaTable.as|deltaTable.as>("production")
  .merge(
    <http://batch.as|batch.as>("Landing"),
    s"Landing.user_id <=> production.user_id")
  .whenMatched()
  .update(Map(
    "total" -> (col("production.total") + col("Landing.total")))
  .whenNotMatched().insertAll()
  .execute()
and it works for me!
🙌 1
h

Harry Metzger

02/02/2023, 4:39 PM
Nice! I wish the documentation was clearer that this was possible.
r

Roberto

02/02/2023, 4:43 PM
Not at all. If you read https://docs.delta.io/latest/delta-update.html#table-deletes-updates-and-merges, you can't find examples of that. In any case, if you look at the API the difference between UpdateExpr (Map[String, String]) and Update (Map[String, Column]) it's clear
5 Views