https://delta.io logo
b

Brayan Jules

06/15/2023, 5:17 PM
Hi all, is this a fair comparison between the current state of delta and other table formats? Also, if you know of good articles that do similar comparisons please share, thanks. https://www.onehouse.ai/blog/apache-hudi-vs-delta-lake-vs-apache-iceberg-lakehouse-feature-comparison
t

Taylor Beever

06/15/2023, 7:28 PM
One of my takeaways is that this article is written posted by the company (Onehouse <> Apache Hudi) who has a managed platform using the technology that conveniently wipes the floor with the other two… Which feels a little suspect but not saying its wrong per se.
c

chris fish

06/15/2023, 7:37 PM
i work at databricks so i’m also biased. historically, delta was significantly more performant than hudi/iceberg. hudi/iceberg are getting more mature and the performance gap is not as signficant anymore, and all 3 have added lots of new features
realistically, its like with any data system, you’ll want to run your own benchmarks and figure out what your decision criteria is
one thing i’ll note, you’ll see lots of claims about open source and fake open source. to me, here’s what really matters. Delta Lake - owned by the Linux Foundation, same foundation that owns Linux. Apache v2 open source license. backed by Databricks, a company that charges money for Data Processing Compute. Databricks doesn’t charge for Delta Lake storage. Apache Hudi - owned by Apache Foundation. Apache v2 open source license. backed by Onehouse, a company that charges money for storing data in Hudi and managing that data. Apache Iceberg - owned by Apache Foundation. Apache v2 open source license. backed by Tabular, a company that charges money for storing data in Iceberg and managing that data.
all 3, you can use them with or without the vendors behind them, and they all use the same open source license, and are owned by nonprofit foundations
b

Brayan Jules

06/15/2023, 7:48 PM
I think this is a great paper on performance comparison, https://petereliaskraft.net/res/cidr_lakehouse.pdf. I was looking into feature-wise comparison. I will do the comparison myself. I am a little bit biased on Delta as well but just wanted to have an objective comparison of the capabilities of each.
👍 1
@Taylor Beever Yes that's why I asked about a fair comparison or at least the view from the delta side.
👍 2
t

Taylor Beever

06/15/2023, 8:01 PM
Thanks @chris fish for providing lots of context
👍 1
m

Matthew Powers

06/15/2023, 11:13 PM
@Brayan Jules - I’d say there are a lot of errors in that post. We should collab on a blog post to give a more factual representation of the current state 😉
r

Robin Moffatt

06/16/2023, 7:29 AM
there's another comparison post here FWIW - I've not gone through it in detail so won't argue as to its veracity or otherwise 🙂 https://lakefs.io/blog/hudi-iceberg-and-delta-lake-data-lake-table-formats-compared/
m

Matthew Powers

06/16/2023, 11:12 AM
@Robin Moffatt - can we collaborate to update that post? • the symlink.txt file isn’t required anymore • Does “Delta Engine” refer to Photon? • Think multi-cluster writes are supporteddelta-rs makes Delta Lake a great option for organizations/workflows that don’t use Spark at all • I think this paper provides users with really good benchmarks as @Brayan Jules mentioned
r

Robin Moffatt

06/16/2023, 11:13 AM
@Matthew Powers definitely! /cc @Oz Katz
🙏 1