airborne12 opened a new pull request, #284:
URL: https://github.com/apache/doris-thirdparty/pull/284

   …
   This pull request includes several changes to improve the handling of UTF-8 
encoding in the `CLucene` library and adds new tests to ensure the correctness 
of these changes. The most important changes include modifications to the 
`IndexInput` and `IndexOutput` classes to handle UTF-8 encoding more accurately 
and the addition of new test cases for UTF-8 characters.
   
   ### Improvements to UTF-8 encoding handling:
   
   * 
[`src/core/CLucene/store/IndexInput.cpp`](diffhunk://#diff-67ecca0c03c369fefa9a51e2f56262efc49a687e097d57ebeab1e78eeb869d72L138-R150):
 Modified the handling of byte sequences to differentiate between incorrect and 
correct UTF-8 encoding, providing a temporary solution to handle 4-byte 
characters.
   * 
[`src/core/CLucene/store/IndexOutput.cpp`](diffhunk://#diff-64611e13ecbcf9b6e9b84e045e7bf35be98a9da95c04b7a71e075399a60ec888L179-R183):
 Updated the writing of byte sequences for 4-byte characters to differentiate 
between incorrect and correct UTF-8 encoding, providing a temporary solution. 
[[1]](diffhunk://#diff-64611e13ecbcf9b6e9b84e045e7bf35be98a9da95c04b7a71e075399a60ec888L179-R183)
 
[[2]](diffhunk://#diff-64611e13ecbcf9b6e9b84e045e7bf35be98a9da95c04b7a71e075399a60ec888L216-R224)
   
   ### Addition of new test cases:
   
   * 
[`src/test/CMakeLists.txt`](diffhunk://#diff-921b2054f6bf380eb08d5c3c21cf8d1c7cfee3736227d611400ae1a13ab3d187R113):
 Added `TestUTF8Chars.cpp` to the list of test files to be compiled.
   * 
[`src/test/test.h`](diffhunk://#diff-993fc9d73840fa074470653e9f8a1e53afc4388b8bc671cd28ecbcfbea8b97b1R93):
 Declared the `testUTF8CharsSuite` function to include the new UTF-8 character 
tests.
   * 
[`src/test/tests.cpp`](diffhunk://#diff-f21ef3314c226873fefc19da14a6e0561cbac2d2ec0b7ef8eb022d4edc2a25a1L9-R9):
 Added `TestUTF8Chars` to the list of unit tests to be executed. 
[[1]](diffhunk://#diff-f21ef3314c226873fefc19da14a6e0561cbac2d2ec0b7ef8eb022d4edc2a25a1L9-R9)
 
[[2]](diffhunk://#diff-f21ef3314c226873fefc19da14a6e0561cbac2d2ec0b7ef8eb022d4edc2a25a1R26)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to