ookumuso commented on PR #10433: URL: https://github.com/apache/iceberg/pull/10433#issuecomment-2334900818
> > The S3 team (@ookumuso) just published what they have developed internally and is now used in Amazon EMR, Athena, GlueETL distributions of Iceberg: #11052, is there a way we can see how we could collaborate on this? > > Sorry for the late reply, Absolutely @jackye1995 ! So it seems like the overlap between this and #11052 is primarily the retrying of the reading of the input stream. #11052 also includes some configurable properties and different configurations for the S3Client. > > For the retrying of the input stream, I think the key thing I was trying to achieve was adding the RetryableInputStream primitive so that other FileIOs like GCS/Azure can also use that. How do we feel about that @jackye1995 @ookumuso ? And then the other PR we can talk about the configuration/default updates? No worries and thanks for following up! I am okay with using the RetryableInputStream as long as we can we can close this out soonish since there seems to be a lot of people hitting the same issue. Have 2 options here: - I can exclude the InputStream changes entirely from my PR and just keep the s3-retry config changes. Might be good to use the same retry configs that we provide in your RetryableInputStream as well - I can keep both depending on your timelines but can be replaced by your change, tests are mostly same but might give you some conflicts on InputStream. What do you think @jackye1995 @amogh-jahagirdar ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org