Alberto Rguez
08/29/2023, 3:20 PMTom van Bussel
08/29/2023, 3:21 PMAlberto Rguez
08/29/2023, 3:24 PMTom van Bussel
08/29/2023, 3:26 PMAlberto Rguez
08/29/2023, 3:27 PMTom van Bussel
08/29/2023, 3:27 PMAlberto Rguez
08/29/2023, 3:29 PMTom van Bussel
08/29/2023, 3:30 PMAlberto Rguez
08/29/2023, 3:31 PMTom van Bussel
08/29/2023, 3:31 PMAlberto Rguez
08/29/2023, 3:32 PMTom van Bussel
08/29/2023, 3:32 PMAlberto Rguez
08/29/2023, 3:33 PMexplain formatted
select count(*) from d_bronze.goco.home_contents
Tom van Bussel
08/29/2023, 3:33 PMAlberto Rguez
08/29/2023, 3:33 PMexplain formatted
select max(quoteid) from d_bronze.goco.home_contents
= Physical Plan ==
AdaptiveSparkPlan (9)
+- ColumnarToRow (8)
+- PhotonResultStage (7)
+- PhotonAgg (6)
+- PhotonShuffleExchangeSource (5)
+- PhotonShuffleMapStage (4)
+- PhotonShuffleExchangeSink (3)
+- PhotonAgg (2)
+- PhotonScan parquet d_bronze.goco.home_contents (1)
Tom van Bussel
08/29/2023, 3:34 PMAlberto Rguez
08/29/2023, 3:34 PM== Physical Plan ==
LocalTableScan (1)
Tom van Bussel
08/29/2023, 3:35 PMAlberto Rguez
08/29/2023, 3:35 PMTom van Bussel
08/29/2023, 3:36 PMAlberto Rguez
08/29/2023, 3:37 PMDominique Brezinski
08/29/2023, 5:20 PMItai Yaffe
09/04/2023, 10:02 AMSELECT COUNT(*)
relies on metadata since version 2.2.0, whereas SELECT MAX(col)
still does not (see this PR and this thread)?
Also, @Tom van Bussel - are you saying that, as long as a column is an integer and is within the delta.dataSkippingNumIndexedCols
boundary, a query such as MAX(col)
should use the metadata (and not scan the entire table)? If that's the case, I guess it solves most of the issues that drove @Felipe Pessoto to open the aforementioned PR to begin with, right?