Matthew Powers
03/06/2023, 3:21 PMget_add_actions
API to give users new insights into the sizes of files in their Delta table. Here’s example usage:
import levi
from deltalake import DeltaTable
dt = DeltaTable("some_folder/some_table")
levi.delta_file_sizes(dt)
# return value
{
'num_files_<1mb': 345,
'num_files_1mb-500mb': 588,
'num_files_500mb-1gb': 960,
'num_files_1gb-2gb': 0,
'num_files_>2gb': 5
}
This makes it clear that the Delta table contains lots of small files and 5 huge files. This Delta table will almost certainly cause slow queries, so the user should optimize.
This is the first API that makes it easy to access this data. Think this is a big step in the right direction to making Delta tables more usable!Jeremy Jordan
03/06/2023, 4:37 PM