kopczynski-9livesdata opened a new issue, #45882: URL: https://github.com/apache/arrow/issues/45882
### Describe the bug, including details regarding any error messages, version, and platform. When running a script like the following ``` import pyarrow.dataset as ds import pyarrow.fs from memory_profiler import profile import pyarrow @profile def load_parquet(): print(f"{pyarrow.total_allocated_bytes() / (1024*1024)}") fs = pyarrow.fs.S3FileSystem() s3_dataset = ds.dataset("[my_s3_bucket]/10g.parquet", filesystem=fs) scanner = s3_dataset.scanner() scanner.head(10) del scanner del s3_dataset del fs pyarrow.default_memory_pool().release_unused() print(f"{pyarrow.total_allocated_bytes() / (1024*1024)}") if __name__ == "__main__": load_parquet() ``` I keep getting significant amount of memory allocated which I cannot force `pyarrow` to release back to OS. Memory profile and output from this script looks like this: ``` 0.0 1516.7496948242188 Filename: src/mem.py Line # Mem usage Increment Occurrences Line Contents ============================================================= 6 125.2 MiB 125.2 MiB 1 @profile 7 def load_parquet(): 8 125.2 MiB 0.0 MiB 1 print(f"{pyarrow.total_allocated_bytes() / (1024*1024)}") 9 130.2 MiB 5.0 MiB 1 fs = pyarrow.fs.S3FileSystem() 10 140.7 MiB 10.5 MiB 1 s3_dataset = ds.dataset("dotdata-ddent-tkopczynski-dev/10g.parquet", filesystem=fs) 11 140.9 MiB 0.1 MiB 1 scanner = s3_dataset.scanner() 12 1700.1 MiB 1559.2 MiB 1 scanner.head(10) 13 1701.0 MiB 0.9 MiB 1 del scanner 14 1701.5 MiB 0.5 MiB 1 del s3_dataset 15 1701.6 MiB 0.1 MiB 1 del fs 16 1701.3 MiB -0.3 MiB 1 pyarrow.default_memory_pool().release_unused() 17 1701.3 MiB 0.0 MiB 1 print(f"{pyarrow.total_allocated_bytes() / (1024*1024)}") ``` Is there anything I'm doing wrong or is this a memory leak? OS: Ubuntu 22.04.5 LTS pyarrow version: 19.0.1 ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org