kamal kaur
05/05/2023, 7:48 PMid
with around 500k unique values. We are using merge operation in that table using id
and we will be querying using Trino by that id
. I know those are too many partitions but if the use case is just to query/merge that table by id
, will this work?chris fish
05/05/2023, 7:54 PMid
is usually never a good ideaTheo LEBRUN
05/05/2023, 8:02 PMid
if you have 500k unique values! Everything will work but you gonna have perf issueskamal kaur
05/05/2023, 8:07 PMTheo LEBRUN
05/05/2023, 8:12 PMid
that’s “fine” but querying without using id
will be very bad…
You don’t have access to a date
column that you can partition on? Also if it’s only 500k total records, I would not partition the data.chris fish
05/05/2023, 8:44 PMTheo LEBRUN
05/05/2023, 8:46 PMkamal kaur
05/05/2023, 9:14 PMchris fish
05/05/2023, 9:24 PMDominique Brezinski
05/07/2023, 1:42 PMMadhumita Bharde
05/10/2023, 3:52 PMhigh cardinality partitioning tends to impact write performance quite a bit.I may be wrong- but if the writes are exclusively in overwrite/merge mode, wouldn’t they actually benefit from partitioning of output table by id that we merge on ?
chris fish
05/10/2023, 6:50 PMTheo LEBRUN
05/10/2023, 6:57 PMMadhumita Bharde
05/10/2023, 7:13 PMmy general rule of thumb is partitions should be at least 1GB bare minimum. you still have dataskipping by IDs as well.hmm
If you write/read using only ID then maybe a key-value DB is more adapted🙂
Dominique Brezinski
05/10/2023, 7:17 PMMadhumita Bharde
05/10/2023, 7:22 PM