https://delta.io logo
i

Ian

03/28/2023, 6:06 AM
Hello, Team I think there is a bug in delta-rs while getting schema, we use the below json input from user and convert it to spark compatible schema with a custom function scenario:
Copy code
schema": {
    "date": {
      "data_type": "string",
      "partition_column": true,
      "nullable": false
    },
    "shift": {
      "data_type": "string",
      "partition_column": false,
      "nullable": false
   }
}
*the above is the schema of the table that we have created but when using deltaTable.sc*hema() it gives the below schema
Copy code
schema": {
    "date": {
      "data_type": "string",
      "partition_column": true,
      "nullable": true
    },
    "shift": {
      "data_type": "string",
      "partition_column": false,
      "nullable": true
   }
}
nullable is returned true when we have set it to false we are using spark to create the delta table
r

rtyler

03/28/2023, 6:10 AM
Are you able to look at the schema in the
_delta_log/
directories
.json
files?
i

Ian

03/28/2023, 6:10 AM
yes
r

rtyler

03/28/2023, 6:15 AM
can you share the raw schema JSON from the files?
i

Ian

03/28/2023, 6:31 AM
yes
it is saving it as true there starange
Copy code
{"protocol":{"minReaderVersion":1,"minWriterVersion":2}}
{"metaData":{"id":"e34b6c7d-e6f3-48c2-87c4-b258a75ae0a3","format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"date\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"shift\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}}]}","partitionColumns":[],"configuration":{},"createdTime":1679984411962}}
{"add":{"path":"part-00000-9ea731c9-bf1c-4f97-b76c-3b36acb9960b-c000.snappy.parquet","partitionValues":{},"size":393,"modificationTime":1679984412396,"dataChange":true}}
{"commitInfo":{"timestamp":1679984412438,"operation":"WRITE","operationParameters":{"mode":"ErrorIfExists","partitionBy":"[]"},"isolationLevel":"Serializable","isBlindAppend":true,"operationMetrics":{"numFiles":"1","numOutputRows":"0","numOutputBytes":"393"},"engineInfo":"Apache-Spark/3.2.1 Delta-Lake/1.1.0"}}
r

rtyler

03/28/2023, 6:38 AM
What version of delta-rs are you working with?
i

Ian

03/28/2023, 6:40 AM
delta 0.4.2
deltalake 0.6.2
r

rtyler

03/28/2023, 6:41 AM
oh dang, that's a bit elderly 🙂 That might be a fixed bug, I was unable to reproduce the issue with the latest version of
main
.
i

Ian

03/28/2023, 6:41 AM
tyler thanks for your time i seem to have identified the issue
its with spark
When writing any data to a Delta table, we will convert the data schema to nullable=true. The inconsistency part is when creating a table WITHOUT any data, we don’t do that.
👀 1
2 Views