https://delta.io logo
s

sharath

05/04/2023, 11:47 AM
Hello.. I tried to partition data based on Year and i see on S3 that there are folders named "Year=2017" etc (Please check below image) Is there any way to avoid the label like "Year" and get only "2017" as folder name ?
m

Matthew Powers

05/04/2023, 11:48 AM
No, these folder names follow Hive-style partitioning conventions, so this naming structure is important.
1
s

sharath

05/04/2023, 11:50 AM
ok, thank you !
r

Robert Kossendey

05/04/2023, 11:54 AM
@Matthew Powers just for my understanding, the Delta protocol does not require hive style partitioning, since partitions are discovered via the delta log instead of file listing. The partitions are only hive style because of interoperability -> transform to parquet native table, right?
m

Matthew Powers

05/04/2023, 11:55 AM
@Robert Kossendey - yep, that’s right.
s

sharath

05/04/2023, 11:57 AM
@Matthew Powers @Robert Kossendey - so no option to remove the hive-style partitioning while writing to S3 ?
m

Matthew Powers

05/04/2023, 12:01 PM
The
Year=
part is important because the
Year
column is removed from the Parquet files that are written. So no option to remove the hive-style partitioning, nor should there be.
s

sharath

05/04/2023, 12:02 PM
ok, got it.. Thanks for the detailed explanation..
g

Gerhard Brueckl

05/04/2023, 1:31 PM
strictly speaking, the names of these folders have no meaning at all, the value for the partition is stored in the delta-log which then references a (data)file that just happens to be stored in one of these folders coming to the root cause of the question (as I understood it): - you cannot change the name of these folders - nor should you rely on their name for various reasons - you should use a proper connector to read the delta table (https://delta.io/integrations/)
k

Kashyap Bhatt

05/04/2023, 2:52 PM
Not quite sure why is this even a question. It's "delta's internal stuff", isn't it? Or does delta actually offer this folder structure as an interface for users to read directly? I mean you're not supposed to be reading/writing in these folders anyway. If you really want a copy your data in a specific folder structure then read from delta and write it using parquet writer.
j

JosephK (exDatabricks)

05/04/2023, 3:43 PM
I'm also wondering why it even matters.