kubat-square-sense opened a new issue, #45236: URL: https://github.com/apache/arrow/issues/45236
### Describe the bug, including details regarding any error messages, version, and platform. We noticed high memory consumption while reading parquet files with pyarrow 18.1. Loading a 600Kb parquet file into a 22Mb consumes over 1 Gb of memory. On 3 different machines (wsl, linux, macos), profiling with memray showed a peak memory of 1 Gb, 1.1 Gb and 1.8 Gb. Running the same code with pyarrow 17 consumes less than 200 Mb. Its quite simple to reproduce. I've attached a dummy parquet which consume slightly less but still over 1 GB. ```python import pyarrow.parquet as pq data = pq.read_table('test.parquet') print(data.nbytes / 1024**2) ``` [test.zip](https://github.com/user-attachments/files/18395000/test.zip) ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org