ookumuso opened a new pull request, #11052: URL: https://github.com/apache/iceberg/pull/11052
Iceberg workloads which exceed S3's prefix limits will see HTTP 503 (SlowDown) error responses. The ideal customer experience is for Iceberg to retry these 503 errors persistently, such that the workload is able to make progress, and eventually trigger S3 autoscaling. Customers using Iceberg OSS with S3FileIO leverage the SDK's default retry mechanism, which is limited to 3 retries. We observe in our load tests that 3 retries is insufficient, and these workloads fail almost immediately. This change aims to improve the customer experience by providing configuration parameters for S3 retries and also introduces better defaults to prevent fast failures. This change is also addressing some of the concerns that were mentioned in a previous pull request(https://github.com/apache/iceberg/pull/8221). Feedback particularly calls out that the number of configurations introduced might be hard for customers to reason about and retries from IOExceptions should be independent of SDK retries. In this commit, we're proposing a new implementation which addresses this feedback. There are three notable changes here: * Introduced three configuration properties to control the retry behavior: the retry count, minimum backoff time, and maximum backoff time. * Better default values for S3 retries, which allow most workloads to succeed. * The retry code introduced by #8221 was simplified to only retry IOExceptions thrown outside of the SDK call (i.e. SSLException, SocketTimoutException). There were additional retry configurations applied based on load testing discoveries: * Workaround an SDK bug to allow for retries on 503s from HEAD requests. (https://github.com/aws/aws-sdk-java-v2/issues/5414) * Add XMLStreamException as a retry-able exception. This exception looks to happen when there's a socket exception during parsing of error XML.(https://github.com/aws/aws-sdk-java-v2/issues/5442) Change is originally authored by @drewschleit, I am following up on his behalf. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org