https://delta.io logo
l

Lucas Zago

02/21/2023, 5:08 PM
Hi guys , is it possible to upsert when a table is not delta, for example, saved in parquet format ?
p

PAWAN SHUKLA

02/21/2023, 5:30 PM
No,
m

Martin

02/21/2023, 5:31 PM
no, you'd need to read from the target table to dataframe, read from source table to dataframe, combine the two dataframes so that the result reflects the desired outcome and then overwrite the full target table.
l

Lucas Zago

02/21/2023, 5:51 PM
@Martin In this way , can i capture inserts and updates? full=(t1.alias("a") .join(t2.alias("b"),col("ca")==col("cb"),"inner") .select(*cols)) (full.write .format("parquet") .mode("overwrite") .option("overwriteSchema","true") .save("path"))
c

Christopher Grant

02/21/2023, 7:37 PM
you definitely can do it, but it's pretty complex and not efficient. it's why formats like delta lake are so convenient for users - as a user, these APIs are given to you.
ultimately you'd do the same thing that delta lake's MERGE does with a couple of joins (you can learn more from

this video

, which is a little out of date, but the fundamentals have not changed)
of course, i recommend just using delta.
l

Lucas Zago

02/21/2023, 8:08 PM
I agree with that, the point is because only in gold layer they transform as a delta table, 😐
Actually I have to do a left join, if i want to bring old + new registers
c

Christopher Grant

02/21/2023, 8:52 PM
i dont know any specifics, but it's probably easier to just get the pipeline and tables over to delta rather than re-inventing the wheel with parquet.
👍 2
2 Views