kopczynski-9livesdata opened a new issue, #45882:
URL: https://github.com/apache/arrow/issues/45882

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   When running a script like the following
   ```
   import pyarrow.dataset as ds
   import pyarrow.fs
   from memory_profiler import profile
   import pyarrow
   
   @profile
   def load_parquet():
       print(f"{pyarrow.total_allocated_bytes() / (1024*1024)}")
       fs = pyarrow.fs.S3FileSystem()
       s3_dataset = ds.dataset("[my_s3_bucket]/10g.parquet", filesystem=fs)
       scanner = s3_dataset.scanner()
       scanner.head(10)
       del scanner
       del s3_dataset
       del fs
       pyarrow.default_memory_pool().release_unused()
       print(f"{pyarrow.total_allocated_bytes() / (1024*1024)}")
   
   if __name__ == "__main__":
       load_parquet()
   ```
   
   I keep getting significant amount of memory allocated which I cannot force 
`pyarrow` to release back to OS.
   
   Memory profile and output from this script looks like this:
   ```
   0.0
   1516.7496948242188
   Filename: src/mem.py
   
   Line #    Mem usage    Increment  Occurrences   Line Contents
   =============================================================
        6    125.2 MiB    125.2 MiB           1   @profile
        7                                         def load_parquet():
        8    125.2 MiB      0.0 MiB           1       
print(f"{pyarrow.total_allocated_bytes() / (1024*1024)}")
        9    130.2 MiB      5.0 MiB           1       fs = 
pyarrow.fs.S3FileSystem()
       10    140.7 MiB     10.5 MiB           1       s3_dataset = 
ds.dataset("dotdata-ddent-tkopczynski-dev/10g.parquet", filesystem=fs)
       11    140.9 MiB      0.1 MiB           1       scanner = 
s3_dataset.scanner()
       12   1700.1 MiB   1559.2 MiB           1       scanner.head(10)
       13   1701.0 MiB      0.9 MiB           1       del scanner
       14   1701.5 MiB      0.5 MiB           1       del s3_dataset
       15   1701.6 MiB      0.1 MiB           1       del fs
       16   1701.3 MiB     -0.3 MiB           1       
pyarrow.default_memory_pool().release_unused()
       17   1701.3 MiB      0.0 MiB           1       
print(f"{pyarrow.total_allocated_bytes() / (1024*1024)}")
   
   ```
   
   Is there anything I'm doing wrong or is this a memory leak?
   
   OS: Ubuntu 22.04.5 LTS
   pyarrow version: 19.0.1
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to