https://delta.io logo
j

Josué Castañeda Landa

03/31/2023, 6:32 AM
Good evening everyone, I have a question, I'm doing a merge, in delta lake, but when I look at the data, I can see that the condition doesn't work: whenNotMatchedInsertAll ,here is a sample of the data
j

Jim Hibbard

03/31/2023, 7:46 AM
Hi Josué*, would you be able to share your code? Thanks!*
j

Josué Castañeda Landa

03/31/2023, 7:47 AM
yes , I used this code for the merge logic
Copy code
delta_table.alias("old").merge(
                    df_spark.alias("new"),
                    "old.profile_objectid = new.profile_objectid"
                    + " AND old.profile_identity = new.profile_identity"
                    + " AND old.clevertap_event_date = new.clevertap_event_date"
                    + " AND old.profile_os_version = new.profile_os_version"
                    + " AND old.profile_platform = new.profile_platform"
                    + " AND old.profile_make = new.profile_make"
                    + " AND old.profile_model = new.profile_model"
                    + " AND old.profile_profiledata_completename = new.profile_profiledata_completename"
                    + " AND old.session_props_session_source = new.session_props_session_source"
                    + " AND old.dominio = new.dominio"
                    + " AND old.is_email_valid = new.is_email_valid"
                    + " AND old.year = new.year"        
                    + " AND old.month = new.month",
                ).whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()
It seems that it only happens the second time, when I run it again, it does not generate duplicates
j

Jon Stockham

03/31/2023, 10:56 AM
So the records on the first run that hit whenNotMatchedInsertAll() are now present in the delta table for the second run. If you run the same records in again they will all hit whenMatchedUpdateAll(). So no, you should not expect duplicates.
2 Views