jacktby jacktby
08/10/2023, 4:15 PM// If there are N columns in the target table, the full outer join output will have:
// - N columns for target table
// - ROW_DROPPED_COL to define whether the generated row should dropped or written
// - if CDC is enabled, also CDC_TYPE_COLUMN_NAME containing the type of change being performed
// in a particular row
// (N+1 or N+2 or N+3 columns depending on CDC disabled / enabled and if Row IDs are preserved)
val outputColNames =
targetOutputCols.map(_.name) ++
Seq(ROW_DROPPED_COL) ++
(if (cdcEnabled) Seq(CDC_TYPE_COLUMN_NAME) else Seq())
Nick Karpov
08/10/2023, 8:40 PMROW_DROPPED_COL
is a boolean column generated by the actions as they execute on the result of the join... this is then used as a filter to not write rows in the outputDF just a little further down from the code snippet you've shared... the CDC column is similarly used but for the purposes of generating the correct CDC output