https://delta.io logo
i

Ion

08/03/2023, 9:06 PM
Hi all, I am working on an issue in Polars related to delta-rs. I noticed that passing a datetime with timezone as column in dataframe always get's written in delta without a timezone? Why is that?
w

Will Jones

08/03/2023, 10:01 PM
I don't think Delta Lake supports timezones, except for reading as the local time zone.
I'm not sure if we are fully compliant with that ATM
i

Ion

08/04/2023, 6:53 AM
But there are two timestamps, one WO timezone, and one with the isAdjustedToUTC
Shouldn't that one return timezone UTC in arrow when you read the delta-table
w

Will Jones

08/04/2023, 2:59 PM
I think it should, yes. I don't think that's been implemented yet though.
i

Ion

08/04/2023, 3:09 PM
So in the future the only timezone that will be accepted is UTC
Or will Delta-RS cast it to the correct one?
w

Will Jones

08/04/2023, 3:11 PM
I think we can accept any time zone. When it is written, it will be written as UTC.
The question is what will it be converted to when read. IIRC Spark converts it to the users local time zone
i

Ion

08/04/2023, 5:21 PM
For me that sounds wrong how they do it in spark. If I store something in UTC, I want to read it again in UTC without having to cast it back to UTC because my PC is in a different timezone
w

Will Jones

08/04/2023, 6:04 PM
🤷
😆 1
i

Ion

08/22/2023, 9:41 AM
@Will Jones I've created a ticket for this issue btw: https://github.com/delta-io/delta-rs/issues/1598 The metadata is there in the parquet file, it's simply not used while reading. Having a quick glance over the repo I couldn't find where the schema is read and inferred. I did see that topyarrow_table calls to the pyarrow dataset which then calls another method to read the schema, but can't find the implementation of how the schema is read. If this is something that is done purely in Python, I can try to fix it if you can give some pointers :)?
👍 1