This is an automated email from the ASF dual-hosted git repository. yiguolei pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push: new 469c8b7ece [Fix](JSON LOAD)fix json load issue when string conform with RFC 4627 #21390 469c8b7ece is described below commit 469c8b7ece427302a9cd824ffccde88389093279 Author: GoGoWen <82132356+gogo...@users.noreply.github.com> AuthorDate: Sun Jul 9 17:16:03 2023 +0800 [Fix](JSON LOAD)fix json load issue when string conform with RFC 4627 #21390 should set: enable_simdjson_reader=false in master as master enable_simdjson_reader=true by default. Issue Number: close #21389 from rapidjson: Query String In addition to GetString(), the Value class also contains GetStringLength(). Here explains why: According to RFC 4627, JSON strings can contain Unicode character U+0000, which must be escaped as "\u0000". The problem is that, C/C++ often uses null-terminated string, which treats \0 as the terminator symbol. To conform with RFC 4627, RapidJSON supports string containing U+0000 character. If you need to handle this, you can use GetStringLength() to obtain the correct string length. For example, after parsing the following JSON to Document d: { "s" : "a\u0000b" } The correct length of the string "a\u0000b" is 3, as returned by GetStringLength(). But strlen() returns 1. GetStringLength() can also improve performance, as user may often need to call strlen() for allocating buffer. Besides, std::string also support a constructor: string(const char* s, size_t count); which accepts the length of string as parameter. This constructor supports storing null character within the string, and should also provide better performance. --- be/src/vec/exec/format/json/new_json_reader.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/be/src/vec/exec/format/json/new_json_reader.cpp b/be/src/vec/exec/format/json/new_json_reader.cpp index 157b8a63e9..f6eabaa7cd 100644 --- a/be/src/vec/exec/format/json/new_json_reader.cpp +++ b/be/src/vec/exec/format/json/new_json_reader.cpp @@ -889,7 +889,7 @@ Status NewJsonReader::_write_data_to_column(rapidjson::Value::ConstValueIterator switch (value->GetType()) { case rapidjson::Type::kStringType: str_value = value->GetString(); - wbytes = strlen(str_value); + wbytes = value->GetStringLength(); break; case rapidjson::Type::kNumberType: if (value->IsUint()) { --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org