https://delta.io logo
y

Yousry Mohamed

07/04/2023, 10:25 AM
For
UniForm
feature, it seems it creates iceberg metadata in v1 format if I am not wrong. Is there any way to write iceberg in v2 format? Many tools like AWS Athena (and probably BigQuery as well) support iceberg v2 not v1. I could be wrong as well re last statement but could not really get BigQuery to understand iceberg metadata produced by delta lake.
s

Sirui Sun

07/12/2023, 2:24 PM
Hey Yousry - I’m the PM for UniForm @ Databricks. is there a particular Iceberg V2 feature that you were hoping UniForm would support? The main one we know about is positional deletes (aka merge-on-read). We plan to support this in UniForm (and by extension to use Iceberg V2) eventually. That said, note that neither BQ nor Snowflake support reading Iceberg tables with positional deletes, which is why we haven’t prioritized it highly yet.
could not really get BigQuery to understand iceberg metadata produced by delta lake.
Could you share the error you’re seeing?
y

Yousry Mohamed

07/12/2023, 10:07 PM
Hi Sirui, Thanks for your feedback. I just wanted to have an Iceberg table produced by delta lake and accessible by big query. This did not work hence I started to investigate what could be the problem. I am not really after anything in iceberg v2. I will grab the error details from BQ and share it shortly.
Error: Error while reading table: default.myexternal_table, error message: Unexpected directory structure for Iceberg data file gs://<my-bucket>/table4/part-00001-58be99bd-e886-4997-a97e-b936eb536a07-c000.snappy.parquet File: gs://<my-bucket>/table4/part-00001-58be99bd-e886-4997-a97e-b936eb536a07-c000.snappy.parquet . I was able to read the same table using Spark as follows:
spark.read.format("iceberg").load("gs://<my-bucket>/table4").show()
The following screenshots show the delta table root folder and iceberg metadata folder respectively.
image.png
s

Sirui Sun

07/13/2023, 5:17 PM
Hey Yousry - you’ll need to set
spark.databricks.delta.write.dataFilesToSubdir
to
true
this is a Spark config which has UniForm write to the special data subdirectory that BigQuery expects
apologies if this was unclear! We’ll aim to get this more clear in the documentation
y

Yousry Mohamed

07/13/2023, 10:54 PM
Thank you @Sirui Sun, it seems I missed that part of the documentation. I will give it a go.
Working like charm and I created a small post about it as well. https://levelup.gitconnected.com/delta-lake-universal-format-a-first-look-9dfa28b68b72
👏 1
👀 1
s

Sirui Sun

07/18/2023, 12:03 AM
Amazing! Thanks for putting this together.
❤️ 1