jpfeuffer opened a new issue, #43604: URL: https://github.com/apache/arrow/issues/43604
### Describe the bug, including details regarding any error messages, version, and platform. I am getting reproducible segfaults when trying to close a CSVWriter in the simplest code possible. ```python import pyarrow as pa from pyarrow import csv import argparse parser = argparse.ArgumentParser(description="Converts a potentially bz2/gz compressed csv file to a tsv/smi file with no header and just 'smiles' and 'id' columns.") parser.add_argument("input_file", help="Path to the input bz2 file") parser.add_argument("output_file", help="Path to the output csv file") parser.add_argument("--chunksize", type=int, default=2**31, help="Number of rows to process at a time") parser.add_argument("--num_chunks", type=int, default=0, help="Number of chunks to process") args = parser.parse_args() input_file = args.input_file output_file = args.output_file chunksize = args.chunksize readopt = csv.ReadOptions(encoding="latin-1", block_size=chunksize) parseopt = csv.ParseOptions(delimiter="\t") convertopt = csv.ConvertOptions(column_types={"smiles": pa.string(), "id": pa.string()}, include_columns=["smiles", "id"]) reader = csv.open_csv( input_file, read_options=readopt, parse_options=parseopt, convert_options=convertopt ) writeopt = csv.WriteOptions(include_header=False, delimiter="\t", quoting_style="none") # schema is just smiles and id column, both strings schema = pa.schema([ ('smiles', pa.string()), ('id', pa.string()) ]) with csv.CSVWriter(sink=output_file, schema=schema, write_options=writeopt) as writer: cnt = 0 while cnt < args.num_chunks or args.num_chunks == 0: try: writer.write_batch(reader.read_next_batch()) cnt += 1 print("Finished chunk ", cnt) except StopIteration: break print("Done. Closing file.") ``` linux aarch64, pyarrow 17 (tried both pip and conda-forge), also tried many combinations of filesystem, pa.OSFile, batch_sizes etc. ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org