https://delta.io logo
r

Rahul Sharma

01/24/2023, 11:10 AM
Hii team i have faced lot’s of time similar error and i posted here a lot but didn’t find the solution yet regarding
Copy code
An error was encountered:
An error occurred while calling o86.sql.
: org.apache.spark.sql.delta.DeltaUnsupportedOperationException: Cannot perform Merge as multiple source rows matched and attempted to modify the same
target row in the Delta table in possibly conflicting ways. By SQL semantics of Merge,
when multiple source rows match on the same target row, the result may be ambiguous
as it is unclear which source row should be used to update or delete the matching
target row. You can preprocess the source table to eliminate the possibility of
i have below data for raw and refine and performing merge ,i have manually verify data is not duplicate or same in raw and refine
Copy code
raw-data

+--------+--------------+-------------+------------+--------------------+-------------------+------------------+--------------+-----------------------+-----------------------+------------------+-----------+---------------------+----+---------+--------
|10|10000.0000    |10000        |0           |1000.0000           |0.0000             |0                 |156           |2021-12-08 17:12:59.457|2023-01-23 17:44:13.6  |null              |null       |true                 |u   |1674476053600 |false    
|11|200000.0000   |10002        |0           |9990.0000           |0.0000             |0                 |1             |2021-12-16 18:40:12.16 |2023-01-24 13:18:44.047|null              |null       |true                 |u   |1674546524047 |false    
+--------+--------------+-------------+------------+--------------------+-------------------+------------------+--------------+-----------------------+-----------------------+------------------+-----------+---------------------+----+---------+--------

refine-data
+--------+--------------+-------------+------------+--------------------+-------------------+------------------+--------------+-----------------------+-----------------------+------------------+-----------+---------------------+----+---------+----------
|10|10000.0000    |10000        |0           |1000.0000           |0.0000             |0                 |156           |2021-12-08 17:12:59.457|2023-01-23 15:55:10.71 |null              |null       |true                 |u    |1674469510710 |false    
|11|200000.0000   |10000        |0           |9990.0000           |0.0000             |0                 |1             |2021-12-16 18:40:12.16 |2023-01-23 11:07:57.187|null              |null       |true                 |u     |1674452277187 |false   
+--------+--------------+-------------+------------+--------------------+-------------------+------------------+--------------+-----------------------+-----------------------+------------------+-----------+---------------------+----+---------+-----------
please look into it pro-actively
k

Kashyap Bhatt

01/24/2023, 2:57 PM
Can you post a reproducible example? code snippet..
r

Rahul Sharma

01/24/2023, 4:07 PM
i have a cdc platform so i can’t provide reproducible example
👍🏽 1
can we connect regarding this
n

Nick Karpov

01/24/2023, 4:21 PM
is that the
raw
and
refine-data
that cause the problem? can you share the exact query then
r

Rahul Sharma

01/24/2023, 4:29 PM
Copy code
%%sql
MERGE INTO delta_test.test_refine v
USING raw u
ON v.userID=u.userID
WHEN MATCHED AND (u.__op = "d"  and u.__deleted='false')
THEN DELETE
WHEN MATCHED AND u.__op = "u"
THEN UPDATE SET *
WHEN NOT MATCHED AND (u.__op = "c" or u.__op = "r")
THEN INSERT *
any update @Nick Karpov @Kashyap Bhatt?
i have apply rank function in refine job and now i am able to run the streaming perfectly. i found the issue if we have multiple data of same userid in one batch then this will give an error. thanks
k

Kashyap Bhatt

01/25/2023, 4:53 PM
Rahul Sharma [5:10 AM]
please look into it pro-actively
Kashyap Bhatt [8:57 AM]
Can you post a reproducible example? code snippet..
Rahul Sharma [10:07 AM]
i have a cdc platform so i can’t provide reproducible example
Perhaps someone with a vested interest, like Databricks folks, have time and are willing to spend it on this without the info.. I don't unfortunately.
n

Nick Karpov

01/25/2023, 5:21 PM
i found the issue if we have multiple data of same userid in one batch then this will give an error.
@Rahul Sharma awesome glad you figured it out... to my eyes this is exactly what the error message indicated since the start, but is there something we can change that it would have been more clear initially?
16 Views