SandeepSinghGahir commented on PR #10433: URL: https://github.com/apache/iceberg/pull/10433#issuecomment-2353901444
> > > > The S3 team (@ookumuso) just published what they have developed internally and is now used in Amazon EMR, Athena, GlueETL distributions of Iceberg: #11052, is there a way we can see how we could collaborate on this? > > > > > > > > > Sorry for the late reply, Absolutely @jackye1995 ! So it seems like the overlap between this and #11052 is primarily the retrying of the reading of the input stream. #11052 also includes some configurable properties and different configurations for the S3Client. > > > For the retrying of the input stream, I think the key thing I was trying to achieve was adding the RetryableInputStream primitive so that other FileIOs like GCS/Azure can also use that. How do we feel about that @jackye1995 @ookumuso ? And then the other PR we can talk about the configuration/default updates? > > > > > > No worries and thanks for following up! I am okay with using the RetryableInputStream as long as we can we can close this out soonish since there seems to be a lot of people hitting the same issue. Have 2 options here: > > > > * I can exclude the InputStream changes entirely from my PR and just keep the s3-retry config changes. Might be good to use the same retry configs that we provide in your RetryableInputStream as well > > * I can keep both depending on your timelines but can be replaced by your change, tests are mostly same but might give you some conflicts on InputStream. > > > > What do you think @jackye1995 @amogh-jahagirdar ? > > Sounds good, yeah let me get this to closure soon since it has been open for a while. I think I'd propose option 1 and we rebase #11052 on top of that. My only take is we really should try and minimize as much configuration as possible in Iceberg since the S3 properties is already quite complex. But I think we can address that in the other PR since not retrying the input stream is a concrete issue that quite a few folks are hitting. Sounds good. Yes, a lot of folks are hitting this issue, also mentioned here: https://github.com/apache/iceberg/issues/10340 Do we have any expected time to merge this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org