[GitHub] [arrow] Kesanov opened a new issue, #34274: [Python] High RAM usage when reading parquet.

via GitHub Tue, 21 Feb 2023 04:00:02 -0800


Kesanov opened a new issue, #34274:
URL: https://github.com/apache/arrow/issues/34274


   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   My parquet file is 40GB in size. When loaded in memory, it is only 10GB in 
size.
   
   However, when reading from disk with `pyarrow.parquet.read_table`, the 
memory usage temporarily grows to 150GB.
   When reading from disk with `pandas.read_parquet(.., engine='pyarrow')`, the 
memory usage temporarily grows to 300GB.
   
   ### Component(s)
   
   Parquet, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] Kesanov opened a new issue, #34274: [Python] High RAM usage when reading parquet.

Reply via email to