eladc opened a new issue, #45432:
URL: https://github.com/apache/arrow/issues/45432

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   Hello,
   
   This is very similar to bug 
#[36007](https://github.com/apache/arrow/issues/36007)
   
   the requesting machine is in the same region as the s3 bucket.
   joblib is used to parallelize the download, up to 56 threads.
   it is very difficult to reproduce, happens at least once a day to random 
users who are using the same code to download, but different parquets.
   
   Installed packages:
   **arrow**                              1.3.0
   **pyarrow**                            14.0.1
   
   
   ```
     File "/opt/venv/lib/python3.10/site-packages/pyarrow/parquet/core.py", 
line 3003, in read_table
       return dataset.read(columns=columns, use_threads=use_threads,   
     File "/opt/venv/lib/python3.10/site-packages/pyarrow/parquet/core.py", 
line 2631, in read
       table = self._dataset.to_table(  
     File "pyarrow/_dataset.pyx", line 556, in pyarrow._dataset.Dataset.to_table
     File "pyarrow/_dataset.pyx", line 3713, in 
pyarrow._dataset.Scanner.to_table
     File "pyarrow/error.pxi", line 154, in 
pyarrow.lib.pyarrow_internal_check_status
     File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_statusError: 
IOError: AWS Error NETWORK_CONNECTION during GetObject operation: curlCode: 28, 
Timeout was reached
   ```
   
   How can I debug this further?
   
   Thank you.
   
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to