Hi guys , is it possible to upsert when a table is not delta, for example, saved in parquet format ?
p
PAWAN SHUKLA
02/21/2023, 5:30 PM
No,
m
Martin
02/21/2023, 5:31 PM
no, you'd need to read from the target table to dataframe, read from source table to dataframe, combine the two dataframes so that the result reflects the desired outcome and then overwrite the full target table.
l
Lucas Zago
02/21/2023, 5:51 PM
@Martin
In this way , can i capture inserts and updates?
full=(t1.alias("a")
.join(t2.alias("b"),col("ca")==col("cb"),"inner")
.select(*cols))
(full.write
.format("parquet")
.mode("overwrite")
.option("overwriteSchema","true")
.save("path"))
c
Christopher Grant
02/21/2023, 7:37 PM
you definitely can do it, but it's pretty complex and not efficient. it's why formats like delta lake are so convenient for users - as a user, these APIs are given to you.
ultimately you'd do the same thing that delta lake's MERGE does with a couple of joins (you can learn more from
this video▾
, which is a little out of date, but the fundamentals have not changed)
of course, i recommend just using delta.
l
Lucas Zago
02/21/2023, 8:08 PM
I agree with that, the point is because only in gold layer they transform as a delta table, 😐
Actually I have to do a left join, if i want to bring old + new registers
c
Christopher Grant
02/21/2023, 8:52 PM
i dont know any specifics, but it's probably easier to just get the pipeline and tables over to delta rather than re-inventing the wheel with parquet.