https://delta.io logo
a

Artsiom Yudovin

02/16/2023, 3:40 PM
Hi, Does anybody face the such issue?
AmazonS3Exception: Slow Down (Service: Amazon S3; Status Code: 503; Error Code: 503 Slow Down
during the delta lake merge? We have a delta table which is partitioned by columns.
d

Dominique Brezinski

02/16/2023, 7:25 PM
Is it a table with a large number of objects? Or with a very long version history? Have you set the randomPrefix option on the delta table?
a

Artsiom Yudovin

02/16/2023, 9:54 PM
What do you mean by a large number of objects?
No, we don’t have long version history. It happened on the first merge in an empty delta table.
No, we don’t set any random prefixes, we just partitioned by columns
d

Dominique Brezinski

02/17/2023, 12:44 AM
Was this in a brand new S3 bucket, or one that you have not done a lot of writes to in some time? Was the merge a large amount of data? The reason I ask these questions is that S3 does some call rate limits that it will start enforcing by sending you those errors, and if you keep going it will start hard dropping TCP connections. However, those rate limits are fairly high—like 3000 ops per second per shard (S3 does not make its sharding visible in anyway to the customer). For a big data set being processed on a reasonably large cluster, it is absolutely possible to hit that rate. That is why the random prefix option exists in Delta Lake. However, S3 buckets dynamically reshard based on I/O rates, but it takes some time. You can reach out to AWS support, and they can check and possibly have your bucket pre-sharded to meet your expected I/O rates.
a

Artsiom Yudovin

02/17/2023, 5:46 PM
We have one streaming that works for many topics from kafka and stores information from the topic in each directory. More than 200+ topics. We have a lot of requests to s3 during the merge. It’s difficult to say about the volume of data.
I haven’t heard about random prefix parameters. It looks interesting. We will try this one.
Also, We have tried to use this parameter:
merge.repartitionBeforeWrite.enabled,
but we get the such exception:
java.util.concurrent.TimeoutException
2 Views