https://delta.io logo
l

Lucas Zago

03/29/2023, 6:21 PM
Hi all, ideally it is a best practice to partition a table, or there is some case which is not recommended?
j

Jim Hibbard

03/29/2023, 6:30 PM
Hi Lucas! It depends on the size of the dataset and what values you have available to partition on, but generally speaking you can get performance improvements via partitions or Z-ordering. E.g. if you tend to access your data in a way where the records retrieved are clustered in time, then partitioning on day or month would probably improve performance.
A good rule of thumb is that each partition should contain at least ~1 GB of data to be worthwhile and most tables smaller than ~1 TB probably aren't work partitioning.
👍 2
j

JosephK (exDatabricks)

03/29/2023, 6:37 PM
Each file should be 1gb in size and each partition about 10-50gb. Be careful about overpartitioning
👍 2
2 Views