metadaddy opened a new pull request, #1568:
URL: https://github.com/apache/iceberg-python/pull/1568

   Similarly to #218, we see occasional timeout errors when writing data to 
S3-compatible object storage:
   
   ```
   When uploading part for key 
'drivestats/data/date_month=2014-08/00000-0-9c7baab5-af18-4558-ae10-1678aa90b6a5.parquet'
 in bucket 'drivestats-iceberg': AWS Error NETWORK_CONNECTION during UploadPart 
operation: curlCode: 28, Timeout was reached
   ```
   
   [I don't believe the issue is specific to the fact that I'm using [Backblaze 
B2](https://www.backblaze.com/cloud-storage) rather than Amazon S3 - I saw 
references to similar error messages with the latter as I was researching this 
issue.]
   
   The issue happens when the underlying `PUT` operation takes longer than the 
request timeout, which is [set to a default of 3 seconds in the AWS C++ 
SDK](https://github.com/aws/aws-sdk-cpp/blob/c9eaae91b9eaa77f304a12cd4b15ec5af3e8a726/src/aws-cpp-sdk-core/source/client/ClientConfiguration.cpp#L184)
 used by Arrow via PyArrow.
   
   The changes in this PR allow configuration of `s3.request_timeout` when 
working directly or indirectly with `pyiceberg.io.pyarrow.PyArrowFileIO`, just 
as #218 allowed configuration of `s3.connect_timeout`.
   
   For example, when creating a catalog:
   
   ```python
   catalog = load_catalog(
       "docs",
       **{
           "uri": "http://127.0.0.1:8181";,
           "s3.endpoint": "http://127.0.0.1:9000";,
           "py-io-impl": "pyiceberg.io.pyarrow.PyArrowFileIO",
           "s3.access-key-id": "admin",
           "s3.secret-access-key": "password",
           "s3.request-timeout": 5.0,
           "s3.connect-timeout": 20.0,
       }
   )
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to