rtyler opened a new issue, #9657:
URL: https://github.com/apache/arrow-rs/issues/9657

   **Describe the bug**
   <!--
   A clear and concise description of what the bug is.
   -->
   
   arrow-csv will generate `\` characters from `Utf8` columns as `\` in output 
which lousier CSV parsers, like those written in C/C++ interpret as a string 
escape sequence and c corrupt the output stream.
   
   
   **To Reproduce**
   <!--
   Steps to reproduce the behavior:
   -->
   
   **Expected behavior**
   <!--
   A clear and concise description of what you expected to happen.
   -->
   
   Arguably those bad CSV parsers should be less bad, but IMHO it's a safe 
operation to convert `\` to `\\` in the output stream out of an abundance of 
caution.
   
   
   **Additional context**
   <!--
   Add any other context about the problem here.
   -->
   
   
   ```patch
   From 2a7615200965a68c4808efe021b0414e6e155135 Mon Sep 17 00:00:00 2001
   From: "R. Tyler Croy" <[email protected]>
   Date: Thu, 2 Apr 2026 18:24:19 +0000
   Subject: [PATCH] chore: properly escape forward slashes in CSV output of
    strings
   
   Signed-off-by: R. Tyler Croy <[email protected]>
   ---
    arrow-csv/src/writer.rs | 31 +++++++++++++++++++++++++++++++
    1 file changed, 31 insertions(+)
   
   diff --git a/arrow-csv/src/writer.rs b/arrow-csv/src/writer.rs
   index c38d1cdec33..8c7f50b3ca8 100644
   --- a/arrow-csv/src/writer.rs
   +++ b/arrow-csv/src/writer.rs
   @@ -293,6 +293,13 @@ impl<W: Write> Writer<W> {
                        ))
                    })?;
   
   +                let data_type = 
batch.schema().field(col_idx).data_type().clone();
   +
   +                if data_type == DataType::Utf8 || data_type == 
DataType::LargeUtf8 {
   +                    // This is fine
   +                    buffer = str::replace(&buffer, "\\", "\\\\");
   +                }
   +
                    let field_bytes =
                        self.get_trimmed_field_bytes(&buffer, 
batch.column(col_idx).data_type());
                    byte_record.push_field(field_bytes);
   @@ -1358,4 +1365,28 @@ sed do eiusmod 
tempor,-556132.25,1,,2019-04-18T02:45:55.555,23:46:03,foo
                write_quote_style_with_null(&batch, QuoteStyle::Always, "NULL")
            );
        }
   +
   +    #[test]
   +    fn test_write_with_forward_slashes() {
   +        let schema = Schema::new(vec![
   +            Field::new("text", DataType::Utf8, true),
   +            Field::new("number", DataType::Int32, true),
   +        ]);
   +
   +        let text = StringArray::from(vec![Some(r"\"), None, Some("world")]);
   +        let number = Int32Array::from(vec![Some(1), Some(2), None]);
   +
   +        let batch =
   +            RecordBatch::try_new(Arc::new(schema), vec![Arc::new(text), 
Arc::new(number)]).unwrap();
   +
   +        // Test with QuoteStyle::Always
   +        assert_eq!(
   +            r#""text","number"
   +"\\","1"
   +"","2"
   +"world",""
   +"#,
   +            write_quote_style(&batch, QuoteStyle::Always)
   +        );
   +    }
    }
   --
   2.43.0
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to