https://delta.io logo
b

Bhaskar

07/07/2023, 6:08 PM
Hello, how can we get a record count for each partitioned value of delta table without reading the delta table (select count). I was looking around deltaLog.
j

Jordan Cuevas

07/07/2023, 6:39 PM
when you say without reading the delta table, do you mean you don't want to load the table at all, or you don't want to pull the full contents to memory? If the latter, running select count and grouping by the partition variable will only ready the metadata and should be very fast
c

chris fish

07/07/2023, 7:36 PM
there’s been a lot of discussion about adding a nice high level API for this, feel free to submit a github issue. it would be nice to be able to load the metadata with a direct API
a

Adam Binford

07/07/2023, 8:29 PM
yeah I was thinking about trying to add a python wrapper that just gives you add actions from the delta log as a dataframe. you can hack around it to get it from the jvm to get super quick stats from the delta log. can also get it from scala land theoretically too, but not an official public API
c

chris fish

07/07/2023, 8:53 PM
yeah all this info is in the
snapshot
object, but its an internal class, not exposed as an easy-to-use public API