Matthew Powers
02/23/2023, 1:06 PMIan Joiner
02/27/2023, 3:18 AMMatthew Powers
02/28/2023, 1:59 PMMatthew Powers
02/28/2023, 3:29 PMYousry Mohamed
02/28/2023, 10:35 PMrusoto_dynamodb
https://github.com/delta-io/delta-rs/issues/1191Matthew Powers
03/01/2023, 3:31 PMMatthew Powers
03/01/2023, 8:23 PMMatthew Powers
03/03/2023, 3:18 PMWill Jones
03/03/2023, 5:45 PMMatthew Powers
03/06/2023, 3:21 PMget_add_actions
API to give users new insights into the sizes of files in their Delta table. Here’s example usage:
import levi
from deltalake import DeltaTable
dt = DeltaTable("some_folder/some_table")
levi.delta_file_sizes(dt)
# return value
{
'num_files_<1mb': 345,
'num_files_1mb-500mb': 588,
'num_files_500mb-1gb': 960,
'num_files_1gb-2gb': 0,
'num_files_>2gb': 5
}
This makes it clear that the Delta table contains lots of small files and 5 huge files. This Delta table will almost certainly cause slow queries, so the user should optimize.
This is the first API that makes it easy to access this data. Think this is a big step in the right direction to making Delta tables more usable!Matthew Powers
03/08/2023, 12:25 AMMatthew Powers
03/08/2023, 12:56 PMIan Joiner
03/10/2023, 4:11 AMIan Joiner
03/10/2023, 12:26 PMMatthew Powers
03/10/2023, 3:22 PMlevi.skipped_stats(delta_table, filters=[('a_float', '=', 4.5)])
function that returns this {'num_files': 2, 'num_files_skipped': 1, 'num_bytes_skipped': 996}
. This allows users to see how many files / how much data gets skipped for different predicates. It’ll help them figure out when they should Z ORDER, etc. The new get_add_actions
API is opening up all sorts of new possibilities to get insights on Delta tables. Here’s the code if you’re interested.Ian Joiner
03/10/2023, 10:05 PMIan Joiner
03/10/2023, 10:06 PMMatthew Powers
03/11/2023, 12:44 PMWill Jones
03/12/2023, 2:17 AM-Zminimal-versions
or probably better with the upcoming -Zdirect-minimal-versions
). That might allow us to be more flexible with our Arrow dependency so we could for example support a range like arrow >= 30, <= 32
. Datafusion probably has more frequent breaking changes in it’s API, but maybe as it gets more stable we could support a wider range for that too.Will Jones
03/12/2023, 11:28 PMWill Jones
03/14/2023, 12:07 AMJeremy Jordan
03/14/2023, 5:18 PMdelta-rs
? i don't see it on the roadmaprtyler
03/16/2023, 2:30 AMAlex Wilcoxson
03/16/2023, 3:27 PMWill Jones
03/16/2023, 11:36 PMBen Temple
03/21/2023, 9:48 AMopen_table_with_storage_options
, but not sure how to do this from the lambda itself
ThanksFilippo Vecchiato
03/21/2023, 9:59 AMrtyler
03/22/2023, 1:01 AMunsafe
floating around I am running into more and more. 🤦 🙀Matthew Powers
03/24/2023, 11:12 AMrtyler
03/25/2023, 8:04 PM