https://delta.io logo
#random
Title
m

Martin

06/29/2023, 4:32 PM
Delta Lake 3.0 Out of nowhere, yesterday Databricks announced the preview release of Delta Lake 3.0.0. I was wondering: was this a secret ninja project of Databricks? How does this fit together with Delta being officially open-sourced since 2022? Was OSS community part of this release or was it implemented Databricks interally exclusively? I did not hear about it until a day ago. Don't get me wrong. I like all these new features but I don't understand the governance model of Delta Lake OSS project if Databricks is pulling all the strings in the background. And if DB is serious about OSS I don't understand why features like
identity columns
still not have been donated to OSS; contrary to promises made a year ago:

https://www.databricks.com/wp-content/uploads/2022/06/db-247-blog-img-3.png

❤️ 1
👀 3
d

Dominique Brezinski

06/29/2023, 4:51 PM
Their are TSC members that are not Databricks employees. Why it may feel like Databricks is driving is simply because they have a team full-time focused on feature development in Delta. The Databricks team open issues and submit design docs for features the same as all other contributors, and project is owned and governed by the Linux Foundation.
r

Robert Kossendey

06/29/2023, 4:53 PM
Does that mean the new UniForm feature was publicly known before the AI Summit announcement?
d

Denny Lee

06/29/2023, 6:47 PM
Hey @Martin We chatted awhile ago and I think I forgot to follow up on it. Yes, we are still very much planning to release
identity columns
its just that we had other overriding priorities as determined by the TSC as @Dominique Brezinski noted. As @Michael Armbrust noted in his keynote, there were three overriding priorities (among many others) which were showcased today: • Delta-kernel: We started building this because there are a lot of connectors (yay) but with at least 8 different separate implementations of the Delta protocol, it was getting difficult to ensure that all of the different APIs build by different organizations to work together seamlessly. This ran the risk of one version of the API being incompatible with another version even if they had the same table features. To ensure this didn't happen, we prioritized the creation of the kernel so we could simplify building connectors with the community. • UniForm: Another major ask by the community was to also simplify the process between the different formats of Iceberg and Hudi. Originally the ask was to simplify migration but through some fun development and testing, it dawn to us that we could build UniForm so you would minimize the need to actually do any migration. This was a big ask by the community as well as so much of their time was used to choose or migrate between formats and this project would significantly simplify the process. • Liquid: Similar to the other two, a lot of the issues were around could we simplify the partitioning structure so developers weren't spending so much time on optimizing performance due to partitions. And as Michael noted, Liquid should be able to help solve this problem too! Back to your first point,
identity columns
and many other features are already on the roadmap. As I had noted in other threads, we I have not communicated the planning as well as we should have and we should be able to focus on this once we get through the bulk of Delta 3.0.0 testing. HTH!
👍 5
m

Martin

06/29/2023, 8:45 PM
thanks for explaining
d

Denny Lee

06/30/2023, 5:32 AM
No problem, but I did want to also call out that I think you're right for calling this out. We need to do a better job explaining our roadmap and reasoning to the whole community much earlier in the process. The fact that I can explain this verbosely is great - but this should have been done a few months earlier and then you wouldn't need to ask the question.
❤️ 8
m

Matthew Powers

06/30/2023, 8:21 PM
@Martin - would also like to note that delta-io/delta is one of many implementations of the Delta Lake transaction log protocol. dask-deltatable, delta-rs, and delta-go are other implementations that are all run by different communities. Not all features are supported, like Microsoft Fabric V-Ordering.