https://delta.io logo
j

jacktby jacktby

08/10/2023, 12:43 PM
for merge into internally, why not use only once join but use two joins? Are there some special reasons?
t

Tom van Bussel

08/10/2023, 1:33 PM
Yes, this helps us avoid rewriting all files for every merge. The first join is responsible for figuring out which files need to be updated, and the second join actually performs the rewrite (but is limited to the set of files that were determined by the first join).
j

jacktby jacktby

08/10/2023, 1:47 PM
sorry, I think you can use once join to get that. It's that you can use the targetOnlyPredicate to filter out the data files, and then use join with source table, maybe right outer join, and then you can see if there is a null in the join rows to determine do match clause and not match clause, I think this method won't rewrite all files for every merge.
By the way, maybe I mistake your meaning, the two designs considers about concurrent merge into?
t

Tom van Bussel

08/10/2023, 1:48 PM
targetOnlyPredicate does not capture everything. Some files in the target may not be affected if there are no matching rows for them in the source. We need to use the join to figure out if there are matching source rows.
Consider the following merge statement for example:
Copy code
MERGE INTO orders t
USING order_mutations s
ON t.date = s.date AND t.id = s.id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *
Suppose the
orders
is clustered (partitioned or z-ordered) by
date
. And suppose that
order_mutations
only contains very recent dates.
If we relied only on
targetOnlyPredicates
(which is empty) then we would have to rewrite all files in the target table. But by joining with the source we can figure out that we only have to rewrite a small subset of the files in the target table.
j

jacktby jacktby

08/10/2023, 1:52 PM
sorry, I give the wrong meaning, I say use merge_option, not targetOnlyPredicates
t

Tom van Bussel

08/10/2023, 1:52 PM
What is
merge_option
?
j

jacktby jacktby

08/10/2023, 1:52 PM
the condition of on xxx
t

Tom van Bussel

08/10/2023, 1:53 PM
Yes, but the merge condition refers to both the target and the source. It is a join condition.
So “using it” means joining the source and the target with each other.
j

jacktby jacktby

08/10/2023, 1:58 PM
thanks I make a mistake, we must finish a whole join and then we can determine match or not