ookumuso opened a new pull request, #11052:
URL: https://github.com/apache/iceberg/pull/11052

   Iceberg workloads which exceed S3's prefix limits will see HTTP 503 
(SlowDown) error responses. The ideal customer experience is for Iceberg to 
retry these 503 errors persistently, such that the workload is able to make 
progress, and eventually trigger S3 autoscaling.
   
   Customers using Iceberg OSS with S3FileIO leverage the SDK's default retry 
mechanism, which is limited to 3 retries. We observe in our load tests that 3 
retries is insufficient, and these workloads fail almost immediately. This 
change aims to improve the customer experience by providing configuration 
parameters for S3 retries and also introduces better defaults to prevent fast 
failures.
   
   This change is also addressing some of the concerns that were mentioned in a 
previous pull request(https://github.com/apache/iceberg/pull/8221). Feedback 
particularly calls out that the number of configurations introduced might be 
hard for customers to reason about and retries from IOExceptions should be 
independent of SDK retries. In this commit, we're proposing a new 
implementation which addresses this feedback. There are three notable changes 
here:
   
   * Introduced three configuration properties to control the retry behavior: 
the retry count, minimum backoff time, and maximum backoff time.
   * Better default values for S3 retries, which allow most workloads to 
succeed. 
   * The retry code introduced by #8221 was simplified to only retry 
IOExceptions thrown outside of the SDK call (i.e. SSLException, 
SocketTimoutException).
   
   There were additional retry configurations applied based on load testing 
discoveries:
   
   * Workaround an SDK bug to allow for retries on 503s from HEAD requests. 
(https://github.com/aws/aws-sdk-java-v2/issues/5414)
   * Add XMLStreamException as a retry-able exception. This exception looks to 
happen when there's a socket exception during parsing of error 
XML.(https://github.com/aws/aws-sdk-java-v2/issues/5442)
   
   
   Change is originally authored by @drewschleit, I am following up on his 
behalf.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to