This is an automated email from the ASF dual-hosted git repository.

liaoxin pushed a commit to branch branch-2.1
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/branch-2.1 by this push:
     new 17d351af80f [fix](csv reader) fix csv parser incorrect if enclosing 
line_delimiter (#38347) (#38445)
17d351af80f is described below

commit 17d351af80fedf94f285e67fb04255ed6664e65f
Author: hui lai <1353307...@qq.com>
AuthorDate: Mon Jul 29 14:55:45 2024 +0800

    [fix](csv reader) fix csv parser incorrect if enclosing line_delimiter 
(#38347) (#38445)
    
    Csv reader parse data incorrect when data enclosing line_delimiter, for
    example, line_delimiter is \n and enclose is ', data as follows:
    ```
    'aaaaaaaaaaaa
    bbbb'
    ```
    it will be parsed as two columns: `'aaaaaaaaaaaa` and `bbbb',` rather
    than one column
    ```
    'aaaaaaaaaaaa
    bbbb'
    ```
    
    The reason why this happened is csv reader will not reset result when
    not match enclose in this `output_buf_read`, causing incorrect
    truncation was made.
    
    Co-authored-by: Xin Liao <liaoxin...@126.com>
---
 be/src/vec/exec/format/file_reader/new_plain_text_line_reader.cpp | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/be/src/vec/exec/format/file_reader/new_plain_text_line_reader.cpp 
b/be/src/vec/exec/format/file_reader/new_plain_text_line_reader.cpp
index 415e4c1e349..9a09a90d1aa 100644
--- a/be/src/vec/exec/format/file_reader/new_plain_text_line_reader.cpp
+++ b/be/src/vec/exec/format/file_reader/new_plain_text_line_reader.cpp
@@ -161,6 +161,11 @@ void 
EncloseCsvLineReaderContext::_on_pre_match_enclose(const uint8_t* start, si
         if (_idx != _total_len) {
             len = update_reading_bound(start);
         } else {
+            // It needs to set the result to nullptr for matching enclose may 
not be read
+            // after reading the output buf.
+            // Therefore, if the result is not set to nullptr,
+            // the parser will consider reading a line as there is a line 
delimiter.
+            _result = nullptr;
             break;
         }
     } while (true);


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to