jpfeuffer opened a new issue, #43604:
URL: https://github.com/apache/arrow/issues/43604

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   I am getting reproducible segfaults when trying to close a CSVWriter in the 
simplest code possible.
   
   ```python
   import pyarrow as pa
   from pyarrow import csv
   import argparse
   
   parser = argparse.ArgumentParser(description="Converts a potentially bz2/gz 
compressed csv file to a tsv/smi file with no header and just 'smiles' and 'id' 
columns.")
   parser.add_argument("input_file", help="Path to the input bz2 file")
   parser.add_argument("output_file", help="Path to the output csv file")
   parser.add_argument("--chunksize", type=int, default=2**31, help="Number of 
rows to process at a time")
   parser.add_argument("--num_chunks", type=int, default=0, help="Number of 
chunks to process")
   
   args = parser.parse_args()
   
   input_file = args.input_file
   output_file = args.output_file
   chunksize = args.chunksize
   
   
   readopt = csv.ReadOptions(encoding="latin-1", block_size=chunksize)
   parseopt = csv.ParseOptions(delimiter="\t")
   convertopt = csv.ConvertOptions(column_types={"smiles": pa.string(), "id": 
pa.string()}, include_columns=["smiles", "id"])
   
   
   reader = csv.open_csv(
       input_file,
       read_options=readopt,
       parse_options=parseopt,
       convert_options=convertopt
   )
   
   writeopt = csv.WriteOptions(include_header=False, delimiter="\t", 
quoting_style="none")
   # schema is just smiles and id column, both strings
   schema = pa.schema([
       ('smiles', pa.string()),
       ('id', pa.string())
   ])
   
   with csv.CSVWriter(sink=output_file, schema=schema, write_options=writeopt) 
as writer:
   
       cnt = 0
       while cnt < args.num_chunks or args.num_chunks == 0:
           try:
               writer.write_batch(reader.read_next_batch())
               cnt += 1
               print("Finished chunk ", cnt)
           except StopIteration:
               break
   
       print("Done. Closing file.")
   ```
   
   linux aarch64, pyarrow 17 (tried both pip and conda-forge), also tried many 
combinations of filesystem, pa.OSFile, batch_sizes etc.
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to