https://delta.io logo
i

Itai Yaffe

07/27/2023, 3:01 PM
Hey folks, Spreading a Delta Table across multiple buckets/storage accounts In many (if not all) object stores, there are various limitations w.r.t throughput (i.e read/write request per second), the maximum total size, etc., within a single bucket (I've seen it both in AWS S3 and Azure ADLS). It's true that if one is using AWS S3, they could potentially follow S3 best practices and scale read/write operations within a single bucket by using prefixes (see S3 docs, and Delta docs -
delta.randomizeFilePrefixes
and
delta.randomPrefixLength
). However, this is a specific solution that's limited to AWS S3 (and does not apply to ADLS, for example, or object storages that provide S3-like API). I've chatted with @Gilad Asulin and @Scott Haines, and It'll be great to get the community's insights w.r.t adding support for spreading a Delta table across multiple buckets/storage accounts (pros/cons, implications, etc.). One option, for example, is to have the
_delta_log
reside in the "main" bucket of the table, and the log entries will point to the fully qualified paths of the data files, which could reside in other buckets (in theory, as @Robert mentioned in this thread,
the log and data files could reside in separate buckets ... the protocol allows for fully qualified paths in add actions
). WDYT?
d

Dominique Brezinski

07/27/2023, 7:45 PM
In theory the log format allows this. I have the privilege of operating the highest IOPS AWS account against S3. We have single buckets with multiple large Delta Lake tables and single buckets with Delta Lake tables >20PB backed by tens of million objects. These tables are written to and read from by low latency, high throughput streams and have very large queries run against them as well. Is your use case bigger?
🤯 2
i

Itai Yaffe

07/31/2023, 9:18 AM
@Dominique Brezinski - I've seen your LinkedIn comment w.r.t those PBs Delta tables - very impressive indeed! To clarify my original message, my point was that all cloud storages (AFAIK) have limitations w.r.t throughput, and while AWS S3 does provide a potential solution (using prefixes), this solution only applies to those using AWS S3 (and not ADLS, for example). I'll adjust my original message accordingly. I'll give you 2 examples, to be more specific: 1. At my company, we have use cases that are within a similar range to what you've described (several PBs in total, with a couple of tables that are a few PBs each). My colleagues (@Tomer Patel, @Gilad Asulin, Yaniv Kunda) and I have presented a couple of those use cases at Data+AI Summit 2023 (see here and here, for example). In one of those use cases, we had to split one of our Delta tables into multiple storage accounts on ADLS (what we called "sharding"), since using Azure's Regional Storage (a private preview feature) simply wasn't good enough for our workloads (see slides 36-43 in this slide deck, the video recording will be available later here). 2. Going forward, as we're migrating some workloads to Akamai Connected Cloud, which provides S3-like object storage, and we'll probably need a similar "sharding" solution there as well. So, I'm looking to do some kind of a brainstorm around a more generic solution, that'll allow Delta users to spread a single table across multiple buckets/account storages/..., without having to implement it on their own for each use case. Hope that makes sense, and I'll be happy to hear everyone's thoughts 🙂
👍 2
d

Dominique Brezinski

07/31/2023, 2:07 PM
That makes sense. More workloads are getting to this scale, and it seems like S3 may be the only storage system that can truly support the IOPS demands of analytic workloads at this scale without doing some heroics. So figuring out how make it more mundane would be good.
i

Itai Yaffe

07/31/2023, 2:12 PM
Thanks! Any thoughts on the cons/implications (I think we agreed on the pros)?
d

Dominique Brezinski

07/31/2023, 2:25 PM
Operations and debugging issues could be harder if say access credentials for one bucket/shard break/expire etc., or performance varies across AZ/region due to an emergent issue. From a core delta lake perspective the only impact I see is slightly increased log size due to fully qualified paths/URLs vs relative paths.
1
i

Itai Yaffe

07/31/2023, 2:34 PM
All good points, thanks! QQ - what's the appropriate way to keep this discussion going among the community? Should we stick to this Slack thread, or should I open an issue in Delta's git repo?
d

Dominique Brezinski

07/31/2023, 2:39 PM
Open an issue and the start working up a design doc. There are several examples for major features that you can crib from.
👍 2