https://delta.io logo
p

Phúc Võ Hồng

03/06/2023, 2:19 AM
Hi guys, I am working on the medallion architecture of delta but there is limited resources to read. Databrick says the gold layer requires fewer join and de-normallized while silver layer has the 3rd normal form. So 1. The silver is the dim of star-schema, and we can join all the dim to the universal set (as the gold table) 2. The silver A . If Join: silver can only be generated from join bronzes on primary key B. For other operation: mainly support clean, filter, augment, create new column 3. After the gold table is generated, which kind of data should it hold: A .The fully joined table of star-schema B. summarized data after groupby and agg (groupby zone, agg count*) Can anyone have experience help me to clarify these concern?
j

Jim Hibbard

03/26/2023, 8:43 AM
Hello, the gold table should hold records that are ready to be consumed. This could mean aggregates, a specific feature of the data, etc. It depends on your use case. This also means that you'll likely have multiple gold tables coming from the same silver table(s) if your data is consumed in different ways for different downstream use cases. Some materials that may help you: • https://www.linkedin.com/feed/update/urn:li:activity:7036371945382170625/https://www.databricks.com/blog/2022/06/24/data-warehousing-modeling-techniques-and-their-implementation-on-the-databricks-lakehouse-platform.html
8 Views