https://delta.io logo
#random
Title
m

Martin

08/28/2023, 4:06 PM
Attribute Lineage for PySpark DataFrame My question is not Delta specific, sorry for that. I was wondering if it would be possible to extract attribute lineage from the execution plan of a PySpark DataFrame. Pseudo-Example:
Copy code
df = spark.table("myTable").withColumn("C", col("A") + col("B")).withColumnRenamed("A", "Z")

magicLineageAnalyzer(df)
>> df.Z <-- myTable.A
>> df.B <-- myTable.B
>> df.C <-- myTable.A, myTable.B
i

Itagyba Abondanza Kuhlmann

08/28/2023, 8:44 PM
There is a tool for that called Spline. I’ve never used it, but I am considering it recently. https://github.com/AbsaOSS/spline-spark-agent
gratitude thank you 1