https://delta.io logo
r

Rahul Sharma

01/11/2023, 12:37 PM
Hii All, several time i got Cannot perform Merge as multiple source rows matched do anyone have better idea how to resolve this problem. i use below command but still getting error ,i don’t have any timestamp column
Copy code
%%sql
MERGE INTO delta.`refine_data` v
USING delta.`raw_data` u
ON v.userID=u.userID
WHEN MATCHED AND (u.__op = "u" or u.__op = "d")
THEN DELETE
k

Kashyap Bhatt

01/11/2023, 3:07 PM
Did you get a chance to read the very detailed error message and referenced page?
j

João Pinto

01/11/2023, 3:27 PM
Did you checked if your source table have duplicated rows?
r

Rahul Sharma

01/11/2023, 3:28 PM
Yes I have duplicates rows in raw table then how to drop any idea
j

João Pinto

01/11/2023, 3:30 PM
Any distinct command or sth like that can help you to get rid of duplicated rows.
r

Rahul Sharma

01/11/2023, 3:33 PM
But I don’t have any time stamp functions to remove duplicate exam can’t use window function
Or wham I found some specific userid which presents in refine but delete not work in nested query
n

Nick Karpov

01/11/2023, 4:24 PM
But I don’t have any time stamp functions to remove duplicate exam can’t use window function
if all the rows in your
refine
with the same id are the same for the remaining fields, you can simply run
distinct
or
dropDuplicates
... if they aren't exactly the same it'll be up to you how to handle them and consolidate them into a single row...,
MERGE
cannot decide this for you, hence the error
r

Rahul Sharma

01/11/2023, 4:34 PM
if i delete duplicate data from raw zone does it impact on refine streaming
n

Nick Karpov

01/11/2023, 4:44 PM
you don't actually need to change the
raw_data
table, it can still have the original data with the duplicates, you can deduplicate as a processing step before MERGE, something like:
Copy code
%%sql

CREATE TEMPORARY VIEW processed_raw_data AS (
    SELECT ...(deduplicate logic)... FROM delta.`raw_data`
)

MERGE INTO delta.`refine_data` v
USING processed_raw_data u
ON v.userID=u.userID
WHEN MATCHED AND (u.__op = "u" or u.__op = "d")
THEN DELETE
r

Rahul Sharma

01/11/2023, 5:15 PM
Oh ok I was doing only one mistake I was not duduplicating data from raw, you can see my query above
k

Kashyap Bhatt

01/11/2023, 5:16 PM
yupp, that's what the error message as well as the article linked in the error message says.
🫣 1
3 Views