Ark-kun opened a new issue, #45971: URL: https://github.com/apache/arrow/issues/45971
### Describe the bug, including details regarding any error messages, version, and platform. We get data batches from BigQuery and write them to parquet. Parquet writer eats up all memory and crashed the pod. It's no JMalloc or whatever since we do not see the issue when we periodically create new ParquetWriter instances. This code leaks: ```py from google.cloud import bigquery from google.cloud.bigquery import _pandas_helpers from pyarrow import parquet client = bigquery.Client(project=...) job = client.get_job(job_id=...) result = job.result() arrow_schema = _pandas_helpers.bq_to_arrow_schema(result.schema) bqstorage_client = client._ensure_bqstorage_client() with parquet.ParquetWriter(where="result.parquet", schema=arrow_schema) as writer: for batch in result.to_arrow_iterable( bqstorage_client=bqstorage_client, max_queue_size=1, max_stream_count=1, ): writer.write_batch(batch) ```  Initially we though that the bug was in BigQuery, but we were wrong. https://github.com/googleapis/python-bigquery/issues/2151 Proof: Changing from ``` with parquet.ParquetWriter(where="result.parquet", schema=arrow_schema) as writer: for batch in result.to_arrow_iterable(...): writer.write_batch(batch) ``` to ``` for batch in result.to_arrow_iterable(...): with parquet.ParquetWriter(where="result.parquet", schema=arrow_schema) as writer: writer.write_batch(batch) ``` fixes the memory leak. Versions: "pyarrow==19.0.1", "pyarrow==16.1.0" ### Component(s) Parquet -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org