jacktby jacktby

08/10/2023, 4:15 PM
delta_lake has a soft delete tag aka ROW_DROPPED_COL? and the CDC_TYPE_COLUMN_NAME is about concurrent merge into? can you give me a data case to analyze this?
Copy code
// If there are N columns in the target table, the full outer join output will have:
    // - N columns for target table
    // - ROW_DROPPED_COL to define whether the generated row should dropped or written
    // - if CDC is enabled, also CDC_TYPE_COLUMN_NAME containing the type of change being performed
    //   in a particular row
    // (N+1 or N+2 or N+3 columns depending on CDC disabled / enabled and if Row IDs are preserved)
    val outputColNames = ++
        Seq(ROW_DROPPED_COL) ++
        (if (cdcEnabled) Seq(CDC_TYPE_COLUMN_NAME) else Seq())

Nick Karpov

08/10/2023, 8:40 PM
is a boolean column generated by the actions as they execute on the result of the join... this is then used as a filter to not write rows in the outputDF just a little further down from the code snippet you've shared... the CDC column is similarly used but for the purposes of generating the correct CDC output