Eric Bellet

05/17/2023, 10:45 AM
Hello, I have a question related to Apache Delta vs Apache Hudi for upsert operations. Which one is better? In my company, they made a benchmark six months ago, and Apache Hudi was faster for upserts operations. (hudi 0.12.1 / delta 2.0.1 / iceberg 1.0.0). Is that still true? Because if that is the case, they will use Hudi instead of Delta.

Vincent Chee

05/17/2023, 11:15 AM
the benchmark doesnt show a significant difference between hudi and delta. you should be good with both

JosephK (exDatabricks)

05/17/2023, 12:26 PM
Hudi might be better for upserts, but there is more than just that.

Denny Lee

05/17/2023, 9:31 PM
Small clarification - we’re Linux Foundation Delta Lake 🙂. It wouldn’t surprise me if there are scenarios where Hudi, Iceberg, or Delta well better the other one. For example, this blog notes that Delta is faster than Hudi using the TPC-DS benchmark. A lot of this has to do with your dataset and configuration. As well, we don’t have all the information pertaining to how these systems are configured, which version of Spark you’re currently using and what environment you’re running this as they all come into play here.

Eric Bellet

05/19/2023, 7:38 AM