Rahul Sharma
01/11/2023, 12:37 PM%%sql
MERGE INTO delta.`refine_data` v
USING delta.`raw_data` u
ON v.userID=u.userID
WHEN MATCHED AND (u.__op = "u" or u.__op = "d")
THEN DELETE
Kashyap Bhatt
01/11/2023, 3:07 PMJoão Pinto
01/11/2023, 3:27 PMRahul Sharma
01/11/2023, 3:28 PMJoão Pinto
01/11/2023, 3:30 PMRahul Sharma
01/11/2023, 3:33 PMNick Karpov
01/11/2023, 4:24 PMBut I don’t have any time stamp functions to remove duplicate exam can’t use window functionif all the rows in your
refine
with the same id are the same for the remaining fields, you can simply run distinct
or dropDuplicates
... if they aren't exactly the same it'll be up to you how to handle them and consolidate them into a single row..., MERGE
cannot decide this for you, hence the errorRahul Sharma
01/11/2023, 4:34 PMNick Karpov
01/11/2023, 4:44 PMraw_data
table, it can still have the original data with the duplicates, you can deduplicate as a processing step before MERGE, something like:
%%sql
CREATE TEMPORARY VIEW processed_raw_data AS (
SELECT ...(deduplicate logic)... FROM delta.`raw_data`
)
MERGE INTO delta.`refine_data` v
USING processed_raw_data u
ON v.userID=u.userID
WHEN MATCHED AND (u.__op = "u" or u.__op = "d")
THEN DELETE
Rahul Sharma
01/11/2023, 5:15 PMKashyap Bhatt
01/11/2023, 5:16 PM