https://delta.io logo
v

Vincent Chee

04/21/2023, 2:20 PM
Hello delta community, is it possible to incremental consume from multiple tables into single delta tables. We are building a wide table of 1000 columns where it will be aggregated from 10 delta tables. The rough idea at the moment is to run sequential Spark streaming jobs (trigger=availableNow) with a while loop to sync this in real time.
j

JosephK (exDatabricks)

04/21/2023, 2:49 PM
why not just 1 job with joins?
c

chris fish

04/21/2023, 6:03 PM
yeah you can join or union the tables together, or you could write separate streams into 1 delta table. usually the first option is better
v

Vincent Chee

04/22/2023, 2:56 AM
Join works but we unable to join the row incrementally (we dont want to join unchanged row on each run). Imagine to join incrementally row. For instance, userId=A lands into tableA and userId=B lands into tableB. The incremental join would result in: 1. row(userId=A, colA=1, colB=N/A) 2. row(userId=B, colA=N/A, colB=1) For incremental update, we are thinking using MERGE INTO (partial update) into target table so we retain column value from other sources. For your reference,
2 Views