https://delta.io logo
y

Yuval Itzchakov

03/23/2023, 9:00 AM
We see a phenomena where queries that search only for MAX(TIMESTAMP) in delta tables are yielding a scan and not a metadata lookup. Has anyone encountered this?
j

JosephK (exDatabricks)

03/23/2023, 11:47 AM
This is a known issue that has to deal w/ how timestamps are saved. There is currently an open ticket to resolve it
s

Sherlock Beard

03/23/2023, 11:48 AM
Oh is it not related to min max optimization ? 😅
j

JosephK (exDatabricks)

03/23/2023, 11:56 AM
I’m pre coffee so I don’t know the exact details, but I think it’s because a timestamp can be saved in either micro or milliseconds and depending on which one it will mess up the max.
Looks like min/max only works with numeric and date columns for metadata queries. File skipping will work on all columns
👍 3
s

Sherlock Beard

03/23/2023, 12:08 PM
Just to be clear we are talking about Delta oss right ?
j

JosephK (exDatabricks)

03/23/2023, 12:27 PM
So technically, we’re talking about the query engine reading the files. Delta is just a file format/protocol and what happens with it is independent of that.
👍 3
You’ll get different results from the query depending on what you use to read the table. In this specific example, you might get better results from a traditional database such as snowflake over using databricks for that query
y

Yuval Itzchakov

03/26/2023, 3:35 AM
@JosephK (exDatabricks) Do you have a link to the open ticket?
I also understand that this optimization should only work for partition columns? Not generally for any column on the table