https://delta.io logo
b

Barny Self

07/10/2023, 9:40 PM
Just tested liquid clustering vs partitioning on a 1 PB table. Results are Extraordinary to say the least right now. This was just a test table and the final table will prob be about 200PB but will feed back to the product group my findings. Simply amazed so far
πŸ‘€ 4
πŸ”₯ 3
r

rtyler

07/10/2023, 11:49 PM
dannnng that's a big table
d

Dominique Brezinski

07/11/2023, 1:38 AM
Good to know! I have a great candidate table for liquid that will be in that size range. Encouraging
j

Jordan Fox

07/11/2023, 5:28 AM
Results were extraordinary... but what were the results?
πŸ‘€ 2
☝️ 1
πŸ˜† 2
f

Felipe Pessoto

07/12/2023, 10:12 PM
How did you test it? Is it in 3.0.0 RC1?
d

Dominique Brezinski

07/12/2023, 10:13 PM
Would be in Databricks Runtime 13....
b

Barny Self

07/12/2023, 10:22 PM
DBR 13.2. Having some unexpected results so far, so am going to talk to the team before sharing anything further because it is probably a me being an idiot situation 🀣
v

Venkat Hari

07/17/2023, 7:36 PM
@Barny Self, any updates on this. My team and I are seeing read operation is performing 1x-2x slower on a 500GB table where liquid clustering is enabled in the day time period column compared to partition and non-partition.
b

Barny Self

07/17/2023, 9:44 PM
@Venkat Hari, I think your table is too small too benefit from clustering with such a low cardinality on your partitioning columns, partitioning is still prob your best bet here. Still working with the product team on this to investigate a few options.
v

Venkat Hari

07/17/2023, 11:48 PM
Thanks, @Barny Self, for the answer.
Still, the table is less than 1TB, without partition providing better I/O in Databricks on AWS. It could be nice to mention in Databricks doc that liquid clustering is for only very large table scenarios.
b

Barny Self

07/18/2023, 11:00 AM
It is not just size but cardinality as well, just working on what extra info is needed to be provided in the docs as I agree with you
v

Venkat Hari

07/19/2023, 12:18 AM
Ah, I missed that point about cardinality you mentioned in your previous message, interesting; going to try on high cardinal column of 1TB table and will update it here
πŸ‘ 1
o

Oliver Angelil

07/26/2023, 7:07 PM
Barny Self, any updates on this. My team and I are seeing read operation is performing 1x-2x slower on a 500GB table where liquid clustering is enabled in the day time period column compared to partition and non-partition.
@Venkat Hari what do you mean by "day time period" column? TimeStamp? Isn't that high cardinality?
v

Venkat Hari

07/27/2023, 1:38 PM
@Oliver Angelil, it’s not that high as I will have 365 days of partition per year with 1GB of data.
o

Oliver Angelil

07/27/2023, 2:58 PM
@Venkat Hari why don't you use Ingestion Time Clustering? Are you sure Liquid Clustering is suitable for your use case?