https://delta.io logo
l

ldacey

08/07/2023, 6:09 PM
I was trying to read and write data from a pyarrow dataset to delta lake using polars but ran into some data type issues with categorical columns and timestamps. my data was originally written with hard-coded pyarrow schemas and we used dictionary types pretty heavily. is this still supported in delta lake?
Copy code
pa.schema(
        [
            ("load_timestamp", pa.timestamp("ns", "UTC")),

            ("survey_completed", pa.timestamp("us", "America/New_York")),
        
            ("business", pa.dictionary(pa.int8(), pa.string())),
and the timestamp error:
Copy code
Exception: Schema error: Invalid data type for Delta Lake: Timestamp(Microsecond, Some("America/New_York"))
w

Will Jones

08/07/2023, 6:29 PM
We haven't yet implement automatic casting of data types, so you have to cast to the exact data types supported by Delta Lake. For timestamps, we only support no time zone or UTC at the moment.
l

ldacey

08/07/2023, 6:49 PM
got it, thanks
will there be support for dictionary columns eventually? for now I will just cast any categoricals to strings
w

Will Jones

08/07/2023, 6:52 PM
Yes, I think so