Matt Richards
03/01/2023, 3:53 PMScott Sandre (Delta Lake)
03/01/2023, 4:47 PM1000
number? Is it paging results?I know delta now does repartition before merge by default, could that have an impact?No. If the table has 1000 parquet files at time T, and then at time T+1 you perform a merge, then delta will have to re-write some parquet files. so surely there will be more than 1000 parquet files
Matt Richards
03/01/2023, 4:49 PMnumber of files read: 1000
Scott Sandre (Delta Lake)
03/01/2023, 4:50 PMthe table contains exactly 1000 files
btwMatt Richards
03/01/2023, 4:53 PMScott Sandre (Delta Lake)
03/01/2023, 5:00 PMthat doesn't tell you the number of actual parquet files?I'm just trying to highlight that if you insert 1000 files, then "remove" 1000 files, then insert "1000" files, you will have 2000 files in your table, because we tombstone files not immediately remove them. So if you insert 1000 files, then do a merge, we may logically remove 100 files, then insert 100 files, and you will have 1100 files in your table.
Matt Richards
03/01/2023, 5:00 PMchris fish
03/01/2023, 5:21 PMdescribe detail
tells you the size and number of files in each tables’ current snapshotMatt Richards
03/01/2023, 5:55 PMchris fish
03/01/2023, 6:19 PMMatt Richards
03/01/2023, 6:34 PMchris fish
03/01/2023, 8:38 PMMatt Richards
03/02/2023, 8:19 AMchris fish
03/02/2023, 6:11 PMcoalesces
partitions down. so when you are reading in a lot of data, there might be 10,000 tasks reading in the data, and then it will coalesce down to 1000 tasks during a shuffle.
but if the tasks never goes above 1000, then that AQE feature will never kick inMatt Richards
03/02/2023, 6:37 PMchris fish
03/02/2023, 7:10 PMMatt Richards
03/03/2023, 3:26 PM