https://delta.io logo
c

Christian Pfarr

03/13/2023, 9:54 PM
Hey guys, im a little confused and maybe you can help me with that. Im currently experimenting with delta lake on minio and i love the simplicity of delta tables with direct path´s for my SQL commands. Now i´ve tried to switch my examples to delta table names and namespaces. Everything is fine so long as i create my table within a spark session and play around with it. If i start another session and try to read the table by name, i get always an
AnalysisException: Table or view not found: default.delta_table;
doesnt matter if i use "default" as namespace or my own name... (Yes the table data and deltalog is there in minio, so everything is fine from storage perspective and i can read the table by path) Could you explain why i can only see my tables during a session and maybe what i could do to see my tables also from other sessions as well?
d

Dominique Brezinski

03/13/2023, 10:34 PM
I assume this is not on Databricks, so I suspect it is because you don't have a persistent metastore configured. Saving the table name to path association requires persistent storage of that metadata, which is done through a metastore. Delta Lake persists all the metadata necessary to read and do transactions against the table in its log in the same object storage as the data, but if you want to access it by anything other than storage path, something has to keep a record of the table name to storage path, and that is done by the metastore.
c

Christian Pfarr

03/14/2023, 6:46 AM
ok i understand that, but how to do so if i have to set the spark catalog to "org.apache.spark.sql.delta.catalog.DeltaCatalog"?
Copy code
delta_catalog_location = "<s3a://lakehouse/delta-warehouse>"

spark = SparkSession.builder \
    .appName("DeltaCDF") \
    .config('spark.executor.memory', '2g') \
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
    .config(f"spark.sql.warehouse.dir", delta_catalog_location) \
    .getOrCreate()
Shouldnt the sql.warehouse.dir command set to shared location do the job? How to configure my session with different metastore if i dont have hive or glue etc?
ok, i think i got it now
enableHiveSupport()
does the trick in that szenario... it ist just a little bit unfortunate that im in need of a hive metastore to handle this. I thought the delta catalog would be able to scan the warehouse directory for databases and loads them on demand 😕
4 Views