Ran Razy07/24/2023, 6:35 AM
, which stands for a timestamp. When we try to get
for this table, it seems that Spark (3.3.2) is scanning a lot of files, and the operation takes a lot of time. Is there any way to optimize this? Rely on some metadata, or configure the table in any way to help Spark calculate this more efficiently? Thanks!
Tom van Bussel07/24/2023, 7:52 AM
then this will result in a very large number of files. For high-cardinality columns we recommend using z-ordering (and Liquid in the future) instead of partitioning.
Itai Yaffe07/27/2023, 1:28 PM
partition column, Ran mentioned it's truncated by a 1-hour interval, so I assume the cardinality is not very high