https://delta.io logo
a

Antony Raj M

04/28/2023, 10:07 PM
Hi Team, I have a question. We are using delta lake open source in AWS ecosystem. We are using Athena 2.0 and to query delta lake tables we use symlink manifest files. I'm looking to share this table across to consumers. There are 2 types of consumers one who query from Athena, and another who consume the table to AWS EMR. Athena users need the symlink version and EMR users need the native delta table version without symlink. I dont want to create 2 different tables for both one poingint to the symlink and another pointing to the data files. Is there a way i can workaround this? Any suggestions would be really helpful
p

Parthiban Jaganathan

04/28/2023, 11:20 PM
Athena supports native read from version 3.0. Only way is to upgrade your Athena version.
d

Dominique Brezinski

04/29/2023, 12:21 AM
If I understand the question correctly, producing the symlink manifeests does not interfere with the table being read as a native delta table. One table can serve both you purposes and accessed by both systems. You just either need to make sure the symlink manifests are produced after each write to the table.
But using native delta support in Athena is much preferable to manifest files.
r

Rahul Sharma

04/29/2023, 2:43 AM
i am also facing same issue ,i can’t see schema delta table in glue and athena
and once i created symlink and if the table update then again we have to run symlink command which is bad thing
p

Parthiban Jaganathan

04/29/2023, 2:31 PM
Symlink is not a great solution. Athena, Presto, Snowflake have already moved away from it to native table reads.
a

Antony Raj M

04/30/2023, 1:21 AM
we are little hesitant as a org to move to athena version 3 anytime soon. Thats where im trying to see if i have an alternate solution.
d

Dominique Brezinski

04/30/2023, 1:24 AM
Why are you hesitant?
a

Antony Raj M

04/30/2023, 1:58 AM
Thats a different team which i dont have control over. But lets say i move to version 3. What about Redshift Spectrum. Can it query Athena tables without manifest files?
p

Parthiban Jaganathan

04/30/2023, 6:23 PM
You can't use single table to solve both read types(symlink and native delta read). If you are using redshift, I don't think it supports native read yet.
gratitude thank you 2
g

Grainne B

05/02/2023, 12:23 AM
We're in the same boat @Antony Raj M ! We will be using Redshift spectrum to query tables , and currently we have to create an external table that points to the manifest file
d

Dominique Brezinski

05/02/2023, 9:47 PM
When you say can't use a single table for both read types, you mean table definition in the Glue Catalog, not actual physical delta lake tables, correct? Because a delta table with symlink manifest generation is surely queryable as both a native delta lake table and through processing the symlink manifests.
p

Parthiban Jaganathan

05/02/2023, 10:05 PM
yes, I meant glue catalog table definition
a

Antony Raj M

05/03/2023, 5:45 PM
there is a small confusion here. When is say we need 2 tables the reason is, when i create a glue table pointing to symlink folder rather than the root folder. Im able to query it through Athena and Redshift scpectrum but im not able to query that through EMR. Is there a way where i can query the symlink table through EMR with delta lake features?
d

Dominique Brezinski

05/05/2023, 2:50 PM
No, to get delta lake features you need a native reader that is working off the delta log, not symlink manifests. Really the right answer is moving forward on Athena to get native delta lake support, and push on Redshift to support the same.