anjali-chadha opened a new issue, #40539:
URL: https://github.com/apache/arrow/issues/40539
### Describe the bug, including details regarding any error messages,
version, and platform.
Hi there!
We are using the PyArrow library to read files from an S3 bucket, and we're
encountering an intermittent error:
`OSError: When reading information for key '<REDACTED>' in bucket
'<REDACTED>': AWS Error NETWORK_CONNECTION during HeadObject operation:
curlCode: 6, Couldn't resolve host name`
Please note that this error doesn't occur consistently, and the S3 bucket
path is valid.
The reference code we're using is as follows:
```
import pyarrow as pa
import pyarrow.json as pj
uri = "s3://my-bucket/my-prefix/foo.json"
fs, path = pa.fs.FileSystem.from_uri(uri)
with fs.open_input_file(path) as f:
tbl = pj.read_json(f)
```
Error Details:
```
2024-03-12T13:06:05.616-07:00 [7]: with fs.open_input_file(path) as f:
2024-03-12T13:06:05.616-07:00 [7]: File "pyarrow/_fs.pyx", line 780, in
pyarrow._fs.FileSystem.open_input_file
2024-03-12T13:06:05.616-07:00 [7]: File "pyarrow/error.pxi", line 154, in
pyarrow.lib.pyarrow_internal_check_status
2024-03-12T13:06:05.616-07:00 [7]: File "pyarrow/error.pxi", line 91, in
pyarrow.lib.check_status
2024-03-12T13:06:05.616-07:00 [7]:OSError: When reading information for
key '<REDACTED>' in bucket '<REDACTED>': AWS Error NETWORK_CONNECTION during
HeadObject operation: curlCode: 6, Couldn't resolve host name
```
Could you please provide any suggestions on how to handle such intermittent
network connectivity errors while reading from S3?
### Component(s)
Parquet, Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]