This is an automated email from the ASF dual-hosted git repository.

yiguolei pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
     new 469c8b7ece [Fix](JSON LOAD)fix json load issue when string conform 
with RFC 4627 #21390
469c8b7ece is described below

commit 469c8b7ece427302a9cd824ffccde88389093279
Author: GoGoWen <82132356+gogo...@users.noreply.github.com>
AuthorDate: Sun Jul 9 17:16:03 2023 +0800

    [Fix](JSON LOAD)fix json load issue when string conform with RFC 4627 #21390
    
    should set: enable_simdjson_reader=false in master as master 
enable_simdjson_reader=true by default.
    
    Issue Number: close #21389
    
    from rapidjson:
    
    Query String
    In addition to GetString(), the Value class also contains 
GetStringLength(). Here explains why:
    
    According to RFC 4627, JSON strings can contain Unicode character U+0000, 
which must be escaped as "\u0000". The problem is that, C/C++ often uses 
null-terminated string, which treats \0 as the terminator symbol.
    
    To conform with RFC 4627, RapidJSON supports string containing U+0000 
character. If you need to handle this, you can use GetStringLength() to obtain 
the correct string length.
    
    For example, after parsing the following JSON to Document d:
    
    { "s" : "a\u0000b" }
    The correct length of the string "a\u0000b" is 3, as returned by 
GetStringLength(). But strlen() returns 1.
    
    GetStringLength() can also improve performance, as user may often need to 
call strlen() for allocating buffer.
    
    Besides, std::string also support a constructor:
    
    string(const char* s, size_t count);
    which accepts the length of string as parameter. This constructor supports 
storing null character within the string, and should also provide better 
performance.
---
 be/src/vec/exec/format/json/new_json_reader.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/be/src/vec/exec/format/json/new_json_reader.cpp 
b/be/src/vec/exec/format/json/new_json_reader.cpp
index 157b8a63e9..f6eabaa7cd 100644
--- a/be/src/vec/exec/format/json/new_json_reader.cpp
+++ b/be/src/vec/exec/format/json/new_json_reader.cpp
@@ -889,7 +889,7 @@ Status 
NewJsonReader::_write_data_to_column(rapidjson::Value::ConstValueIterator
     switch (value->GetType()) {
     case rapidjson::Type::kStringType:
         str_value = value->GetString();
-        wbytes = strlen(str_value);
+        wbytes = value->GetStringLength();
         break;
     case rapidjson::Type::kNumberType:
         if (value->IsUint()) {


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to