Satyadeep Sinha

04/18/2023, 2:39 AM
Is this is correct to use a bronze layer as a delta file(map file with target schema), instead of ingesting the Raw (as-is) file in Medallion architecture? I know there is a cons, but would like to understand if anyone has implemented this kind of architecture? Later on, are they find any issues with this? Appreciate your reply to this

Jim Hibbard

04/18/2023, 2:45 AM
Hi Sat, there's some flexibility here depending on your use case / situation, I largely consider the differences to be at the terminology level and not the implementation level. The bronze layer is usually thought of as being a lossless format conversion from whatever the input is into a Delta Table. You would also usually add columns containing metadata with information like when a row was ingested, the source file, etc. In the case of files being the input to your ETL pipeline, some people have a directory where all those files land and call that the bronze layer even thought the files are not in Delta Format. Others would call this a landing zone, and consider the bronze layer to be the first Delta Table.
This LinkedIn post / cheat sheet might be helpful to you. I also like this blog on the subject. Hope that helps!
In the scenario you're outlining, you're somewhat skipping the bronze layer and going from "Landing Zone" --> "Silver Layer". The downside of this is that re-processing your data if you want to use an additional value from the source records will require re-parsing a potentially inefficient to parse file format. If you were re-parsing/processing from a bronze layer that was a 1:1 representation with the source files, the reprocessing would be more efficient. It basically saves you from doing an inefficient record parse more than once.