Can autoloader detect, when a new feed is missing columns?
From what I see, it automatically fills the missing columns with nulls.
04/12/2023, 9:01 AM
That is what the beauty of schema evolution and schema on read systems
04/12/2023, 11:52 AM
Yeah that's from automerging the schema. Autoloader isn't detecting that, per se, it's more about what happens on the write itself
04/12/2023, 2:53 PM
Yes, it's a beauty in some scenarios, but not in all.
We (with Hubert 🙂 ) are actually trying to build a generic service that can load parquet files from defined locations. The parquet files are created by upstream services.
We want to be able to detect schema changes and classify them as:
• non-breaking changes - new column added - allowed
• breaking changes - column removed - not allowed
For example if a completely different file is dropped into a location by mistake, we don't want autoloader to create "n" additional columns and fill the rest with NULLs. 😅