Tim Burns
03/26/2023, 7:50 PM# Load the Delta Lake table
delta_table = DeltaTable.forPath(spark, "/path/to/delta_table")
# Get the metadata of the table
metadata = delta_table.metadata
# Print the metadata
print(metadata)
However, I find that this doesn't work, even through the metadata JSON is right there, so I'm not writing code like this to find a simple table schema on a metadata table.
delta_dir = join(table_path, "_delta_log")
for file in os.listdir(delta_dir):
if file.endswith(".json"):
with open(join(delta_dir, file)) as json_file:
for json_line in json_file:
json_obj = json.loads(json_line)
if "metaData" in json_obj:
self.catalog_metadata[table_path] = json_obj["metaData"]
schema_string = metadata["schemaString"]
result = json.JSONDecoder().decode(schema_string)
It seems like such a simple basic question to ask of the delta table, what is your schema. Why isn't it baked into the API? Or if it, how do I get at it without resorting to writing code?
Thanks, TimJim Hibbard
03/26/2023, 10:03 PM.schema
property:
# load delta table
delta_table = DeltaTable.forPath(spark, '/path/to/delta_table')
delta_table.schema
⢠docs on schema property
Hope that helps! Let me know if you're looking for something slightly different. There's a couple ways at this.Tim Burns
03/27/2023, 10:31 AMtable_df = delta_table.toDF()
table_df.show()
table_df.printSchema() # Good
print(table_df.schema)
I'm guessing this will all become clear in timeJim Hibbard
03/27/2023, 2:18 PM.current_schema
property to make this easier.Tim Burns
03/28/2023, 11:18 AM