Jeremy Jordan
05/15/2023, 7:17 PMchris fish
05/15/2023, 7:21 PMBy default Delta Lake on Databricks collects statistics on the first 32 columns defined in your table schema. You can change this value using the table property delta.dataSkippingNumIndexedCols. Adding more columns to collect statistics would add more overhead as you write files.
Collecting statistics on long strings is an expensive operation. To avoid collecting statistics on long strings, you can either configure the table property delta.dataSkippingNumIndexedCols to avoid columns containing long strings or move columns containing long strings to a column greater than delta.dataSkippingNumIndexedCols using ALTER TABLE ALTER COLUMN. See ALTER TABLE
Lennart Skogmo
05/15/2023, 7:21 PMchris fish
05/15/2023, 7:22 PMsite:<http://docs.databricks.com|docs.databricks.com>
will usually help you find the thing you’re looking forJeremy Jordan
05/15/2023, 7:23 PM