https://delta.io logo
s

Satyam Singh

04/06/2023, 4:27 PM
I have created a delta table with columns . e.g. ColA, ColB, ColC, ColD. Scenario - First one notebook is writing to this delta table where dataframe has columns ColA, ColB (i.e. first 2 columns only) . Then a second notebook is writing to the same delta table where dataframe has columns ColA, ColC, ColD. When this second notebook is trying to write to delta table it is through error - "schema mismatch" (my investigation says because first notebook only wrote 2 columns). I do not want to enable schema merge option. Looking for suggestion.
j

Jim Hibbard

04/06/2023, 4:36 PM
Out of curiosity, why don't you want to use the "mergeSchema" option? Alternatively you could have two separate Delta Tables.
s

Satyam Singh

04/06/2023, 4:44 PM
Actually the delta table has more than 30 columns and both the notebooks are writing to the same delta table , just that the two notebooks do not have data for few of the columns. I wish to maintain tight control over the schema. I don't want the table schema to get updated automatically. But i should not need to worry about this, because i have already created a delta table with fixed schema and even though different notebooks are writing a subset of column values , it should work fine ideally.
j

Jim Hibbard

04/06/2023, 5:04 PM
That makes sense to me. I'd make the columns nullable and pass nulls for the missing column values then.
s

Satyam Singh

04/06/2023, 5:15 PM
i did that. still the same error. )Infact this is the default behaviour of delta table that if you do not pass a column value it will put null value for those columns. ) I believe error is because while you write a dataframe with null values for few of the columns , there is still data type mismatch for these columns , as the dataframe is not strongly typed, and the data type for these columns are not inferred.
Finally found solution for this weird issue. 1. created delta table with schema 2. wrote an empty dataframe with the exact schema as of the delta table 3. Then writing from different notebooks with dataframes having different subset of columns is working fine and not complaing about schema mismatch. Thank you Jim for your advices. Really appreciate.
👍 1
j

Jim Hibbard

04/06/2023, 5:36 PM
Awesome! Glad it's working well for you now 🙌
5 Views