Hubert Kaczmarczyk

04/12/2023, 8:56 AM
Autoloader question: Can autoloader detect, when a new feed is missing columns? From what I see, it automatically fills the missing columns with nulls.

Anirban Goswami

04/12/2023, 9:01 AM
That is what the beauty of schema evolution and schema on read systems

Randy Sims

04/12/2023, 11:52 AM
Yeah that's from automerging the schema. Autoloader isn't detecting that, per se, it's more about what happens on the write itself

Adam Zolnowski

04/12/2023, 2:53 PM
Yes, it's a beauty in some scenarios, but not in all. We (with Hubert šŸ™‚ ) are actually trying to build a generic service that can load parquet files from defined locations. The parquet files are created by upstream services. We want to be able to detect schema changes and classify them as: ā€¢ non-breaking changes - new column added - allowed ā€¢ breaking changes - column removed - not allowed For example if a completely different file is dropped into a location by mistake, we don't want autoloader to create "n" additional columns and fill the rest with NULLs. šŸ˜…

Christopher Grant

04/12/2023, 3:49 PM
if you're outputting to Delta, consider constraints
šŸ‘ 1