danielcweeks commented on PR #11052:
URL: https://github.com/apache/iceberg/pull/11052#issuecomment-2379484397

   @ookumuso After thinking about this a little more, I'm increasingly 
concerned about the high value for defaults.  If we look at this in isolation, 
it seems like the right thing to do with the assumption that we're getting 
throttled and S3 will repartition and everything will be fine going forward.
   
   However, in aggregate, there's no guarantee that S3 is going to repartition 
and in the degenerate cases (lots of small files, but also a rather small 
overall size), I'm not convinced that S3 will actually repartition.  Higher 
retries at this point will mask the underlying issue and the net result is that 
you just get slower overall performance.
   
   My recommendation is, 
   
   1. set the default lower (maybe 5 retries), but allow it to be configured if 
there are problems for specific workloads/queries.  
   2. log in the retry handler when throttling is happening (I know the SDK 
logs, but I don't that's typically enabled by default, so we need to expose 
what's happening).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to