Kevin Lim
07/23/2023, 1:00 AMYuri Niitsuma
07/23/2023, 1:07 PMAre there any open source/selfhosted catalogs one can use with delta lake?Open source you can use Hive catalog Example of simple (not production ready) Hive catalog server: https://github.com/ignitz/jibaro/blob/main/lake_lab/hive/Dockerfile
Kevin Lim
07/23/2023, 6:26 PMJordan Fox
07/23/2023, 10:31 PMKevin Lim
07/24/2023, 12:07 AMJordan Fox
07/24/2023, 12:30 AMDeltaTable.from_data_catalog()
calls RawDeltaTable.get_table_uri_from_data_catalog
and the DataCatalog
class currently only has AWS='glue'
and UNITY='unity'
.
The from uri call shows
#[classmethod]
fn get_table_uri_from_data_catalog(
_cls: &PyType,
data_catalog: &str,
database_name: &str,
table_name: &str,
data_catalog_id: Option<String>,
) -> PyResult<String> {
let data_catalog =
deltalake::data_catalog::get_data_catalog(data_catalog).map_err(|_| {
PyValueError::new_err(format!("Catalog '{}' not available.", data_catalog))
})?;
let table_uri = rt()?
.block_on(data_catalog.get_table_storage_location(
data_catalog_id,
database_name,
table_name,
))
.map_err(|err| PyIOError::new_err(err.to_string()))?;
Ok(table_uri)
}
Which had an error saying not supported yet if it isn't in the DataCatalog class.
So, tldr, yes, only Glue and Unity if you wana load your table from a Catalog. You can still load your table from literally anywhere else though.rtyler
07/24/2023, 12:36 AMJordan Fox
07/24/2023, 12:37 AMrtyler
07/24/2023, 12:47 AMJordan Fox
07/24/2023, 12:49 AMKevin Lim
07/24/2023, 1:40 AMterminate called without an active exception
import pandas as pd
from deltalake.writer import write_deltalake
df = pd.DataFrame({"a": [1,2,3,4,5], "b": [3,2,1,2,5]})
write_deltalake("dlake", df)
df = pd.DataFrame({"a": [1,2,3,4,5], "b": [3,2,1,2,5]})
write_deltalake("dlake", df, mode="append")
Matthew Powers
07/25/2023, 6:01 PMKevin Lim
07/25/2023, 6:17 PMMatthew Powers
07/25/2023, 6:41 PM