https://delta.io logo
r

Rahul Sharma

01/16/2023, 11:38 AM
Hii All, several time i got error Cannot perform Merge as multiple source rows matched . i have check my raw table but there is no duplicate record found why i am getting error during merge this record into refine? please see image which describe count in raw data
j

Jon Stockham

01/16/2023, 11:43 AM
it's duplicates on your merge key, not the entire record.
r

Rahul Sharma

01/16/2023, 11:45 AM
yes i also check group by userid having count(*)>1 but there is not duplicate record
does compaction generating issue ?
j

Jon Stockham

01/16/2023, 11:47 AM
is that exactly how you're reading from your raw data in your merge job?
you're not reading from change data feed?
r

Rahul Sharma

01/16/2023, 11:48 AM
Copy code
raw_delta_df = (
				self.spark.
				readStream.
				format("delta")
				.load(self.raw_table_config['raw_delta_loc'])
					)
no i am not using CDF
j

Jon Stockham

01/16/2023, 11:57 AM
I don't know but the error message is quite clear. So if there are no duplicates on your merge key in the raw table then I suggest checking your code to make sure you're not introducing them somewhere else. Perhaps write out your merge data to a separate location and inspect the data.
r

Rahul Sharma

01/16/2023, 11:58 AM
yes i am writing the data into separate location with diff checkpoints
i saw some time before one parameter drop duplicate during reading data you have idea ?
v

Vishal Kadam

01/17/2023, 4:29 PM
@Rahul Sharma You have duplicate records for your key in a dataframe which you want to merge
r

Rahul Sharma

01/17/2023, 4:30 PM
Which Df raw or refined
3 Views