Ran Razy
07/24/2023, 6:35 AMts
, which stands for a timestamp.
When we try to get max(ts)
for this table, it seems that Spark (3.3.2) is scanning a lot of files, and the operation takes a lot of time.
Is there any way to optimize this? Rely on some metadata, or configure the table in any way to help Spark calculate this more efficiently?
Thanks!Tom van Bussel
07/24/2023, 7:52 AMts
then this will result in a very large number of files. For high-cardinality columns we recommend using z-ordering (and Liquid in the future) instead of partitioning.Itai Yaffe
07/27/2023, 1:28 PMts
partition column, Ran mentioned it's truncated by a 1-hour interval, so I assume the cardinality is not very high