Douglas Pires Martins

03/01/2023, 6:23 PM
Hi guys, it's great to be here with you all! Yesterday a raised a question to @Denny Lee on Linkedin and he suggested putting it here to provoke you all. Days ago, I had a discussion with my tech lead about the framework Lakehouse using Delta Lake. We were debating ( health discussion ) whether the lakehouse approach applies Clean Architecture from Uncle Bob. He has a background in Software Engineering, anyways. He said the architecture, bronze, silver, and gold are not compared for example from "Entities" and "Use Case", using TDD, and so on. As I know, the lakehouse is built to be an SSOT and among other AMAZING things. Do you have any thoughts to share or even similarities that I can show to him? We use dbx, databricks, scala and pyspark to develop our jobs. I appreciate it in advance, many thanksss.


03/01/2023, 6:24 PM
I have some thoughts on uncle bob 😂

Douglas Pires Martins

03/03/2023, 3:06 PM
What do you mean?👀

Denny Lee

03/06/2023, 10:56 PM
Hmm - okay some quick thoughts: • The bronze, silver, gold is about a data quality framework in which at each stage, the data at the next level is cleaner than the previous. • The concept of a medallion architecture would be orthogonal to the development lifecycle whether it would be TDD, agile, or any other development lifecycle. • The lakehouse itself is related to the medallion architecture in that we use the medallion architecture to keep the lakehouse clean. BUT, you could employ the medallion data quality framework on a data lake, data warehouse, document store, etc. • Instead, the lakehouse is about taking the best of the database/data warehouse (simplicity, easier manageability, transactional consistency) and data lakes (scalability and flexibility) • The Lakehouse absolutely should be setup with SSOT in mind but there are many valid exceptions to that generality. • In terms of entities and use cases, this is apropos to the context of data marts where we built databases from datawarehouses that were use case, business domain (or entity) specific. • You can do the equivalent with lakehouses as well - i.e. build "lakeroooms" or "lakemarts" for your specific entities or use cases.