amogh-jahagirdar commented on PR #10433:
URL: https://github.com/apache/iceberg/pull/10433#issuecomment-2334992007

   > > > The S3 team (@ookumuso) just published what they have developed 
internally and is now used in Amazon EMR, Athena, GlueETL distributions of 
Iceberg: #11052, is there a way we can see how we could collaborate on this?
   > > 
   > > 
   > > Sorry for the late reply, Absolutely @jackye1995 ! So it seems like the 
overlap between this and #11052 is primarily the retrying of the reading of the 
input stream. #11052 also includes some configurable properties and different 
configurations for the S3Client.
   > > For the retrying of the input stream, I think the key thing I was trying 
to achieve was adding the RetryableInputStream primitive so that other FileIOs 
like GCS/Azure can also use that. How do we feel about that @jackye1995 
@ookumuso ? And then the other PR we can talk about the configuration/default 
updates?
   > 
   > No worries and thanks for following up! I am okay with using the 
RetryableInputStream as long as we can we can close this out soonish since 
there seems to be a lot of people hitting the same issue. Have 2 options here:
   > 
   > * I can exclude the InputStream changes entirely from my PR and just keep 
the s3-retry config changes. Might be good to use the same retry configs that 
we provide in your RetryableInputStream as well
   > * I can keep both depending on your timelines but can be replaced by your 
change, tests are mostly same but might give you some conflicts on InputStream.
   > 
   > What do you think @jackye1995 @amogh-jahagirdar ?
   
   Sounds good, yeah let me get this to closure soon since it has been open for 
a while. I think I'd propose option 1 and we rebase #11052  on top of that. My 
only take is we really should try and minimize as much configuration as 
possible in Iceberg since the S3 properties is already quite complex. But I 
think we can address that in the other PR since not retrying the input stream 
is a concrete issue that quite a few folks are hitting.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to