lasuomela opened a new issue, #44146: URL: https://github.com/apache/arrow/issues/44146
### Describe the bug, including details regarding any error messages, version, and platform. Hi, I'm using Huggingface Datasets to read data from disk, which internally utilizes PyArrow. Reading data, I intermittently run into the following type of error: ``` pa_table = opened_stream.read_all() File "pyarrow/ipc.pxi", line 762, in pyarrow.lib.RecordBatchReader.read_all File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status OSError: Expected to be able to read 1013159856 bytes for message body, got 513144496 ``` Any ideas about what typically produces this kind of errors with PyArrow, or any debugging tips? Context: Huggingface Datasets reads the data with an approach equivalent to: ``` import pyarrow as pa filename = 'some_path.arrow' memory_mapped_stream = pa.memory_map(filename) opened_stream = pa.ipc.open_stream(memory_mapped_stream) pa_table = opened_stream.read_all() ``` I access the same data concurrently from 4 processes. I assume the problem isn't in the data itself, because at a later time I can read the piece of data where the error occurs without any problem. Furthermore, this error only occurs on one of the computers I'm using. Could this be caused by some 'system' related thing? For example, the maximum number of simultaneously memory mapped files is limited in `/proc/sys/vm/max_map_count`. Pyarrow version: Tried 15.0, 17.0 ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org