https://delta.io logo
m

Martin

07/17/2023, 9:57 AM
Answer COUNT() using Delta log metadata for partitioned tables* Delta 2.2.0 introduced a feature:
Aggregate pushdown into Delta scan for SELECT COUNT(). Aggregation queries such as `SELECT COUNT()` on Delta tables are satisfied using file-level row counts in Delta table metadata rather than counting rows in the underlying data files. This significantly reduces the query time as the query just needs to read the table metadata and could make full table count queries faster by 10-100x.
Does this also works when running
SELECT COUNT(*) FROM my_table WHERE partition_column = 1
if
my_table
is partitioned by
partition_column
?
i

Itai Yaffe

07/17/2023, 10:16 AM
Interesting question! I have a somewhat similar wonder w.r.t aggregations like
MAX(partition_column)
m

Martin

07/17/2023, 11:28 AM
my first tests indicate that feature only works for
COUNT(*)
on the entire table. 🙁 Would this be worth a feature request? I'm no scala developer but I can describe what the feature should do.
d

Dominique Brezinski

07/17/2023, 1:56 PM
Yes add a feature request. I suspect there might be one, so do some searching first.
👍 1
f

Felipe Pessoto

07/18/2023, 2:41 AM
i

Itai Yaffe

07/18/2023, 12:58 PM
@Gerhard Brueckl - see above (per your comment on LinkedIn)
g

Gerhard Brueckl

07/18/2023, 2:33 PM
Thanks for the quick answer, will have a look later today!
👍 1
i

Itai Yaffe

07/24/2023, 11:55 AM
@Ran Razy