Ryan19929 opened a new pull request, #49519:
URL: https://github.com/apache/doris/pull/49519

   ### What problem does this PR solve?
   
   Issue Number: close #35411 
   
   Related PR: #xxx
   
   Problem Summary:
   This pull request adds support for the IK Tokenizer to the project. Key 
changes include migrating the ik tokenizer  from Java to C++, updating the 
inverted index parser to work with IK, and adding new test cases for the IK 
analyzer.
   ## IK Integration:
   - `be/CMakeLists.txt`: Added installation of IK dict files to the output 
directory.
   ## Inverted Index Parser Updates:
   - `be/src/olap/inverted_index_parser.cpp`: Added support for the PARSER_IK 
type in the `inverted_index_parser_type_to_string`, 
`get_inverted_index_parser_type_from_string` and 
`get_parser_mode_string_from_properties` functions.
   - `be/src/olap/inverted_index_parser.h`: Defined PARSER_IK in the 
InvertedIndexParserType enum and added the corresponding string constant. 
   - `be/src/olap/rowset/segment_v2/inverted_index/analyzer/analyzer.cpp`: 
Included the IKAnalyzer header and updated the create_analyzer function to 
handle the PARSER_IK type.
   - `be/src/vec/functions/function_tokenize.cpp`: Update error message.
   - 
`fe/fe-core/src/main/java/org/apache/doris/analysis/InvertedIndexUtil.java`: 
Added support for the IK analyzer.
   ## Test Cases
    
   - `regression-test/suites/inverted_index_p0/test_ik_analyzer.groovy`: Added 
a new test suite for the IK analyzer, including table creation, data insertion, 
and query validation.
   - `regression-test/suites/inverted_index_p0/test_tokenize.groovy`: Added 
test cases for tokenizing text using the IK parser.
   - `regression-test/data/inverted_index_p0/test_ik_analyzer.out`: Added 
expected output for the IK analyzer test cases.
   - `regression-test/data/inverted_index_p0/test_tokenize.out`: Added expected 
output for the tokenization test cases using the IK parser.
   
   ###  Release note
   
   None
   
   ### Check List (For Author)
   
   - Test <!-- At least one of them must be included. -->
       - [X] Regression test
       - [X] Unit Test
       - [ ] Manual test (add detailed scripts or steps below)
       - [ ] No need to test or manual test. Explain why:
           - [ ] This is a refactor/code format and no logic has been changed.
           - [ ] Previous test can cover this change.
           - [ ] No code files have been changed.
           - [ ] Other reason <!-- Add your reason?  -->
   
   - Behavior changed:
       - [X] No.
       - [ ] Yes. <!-- Explain the behavior change -->
   
   - Does this need documentation?
       - [ ] No.
       - [X] Yes. <!-- Add document PR link here. eg: 
https://github.com/apache/doris-website/pull/1214 -->
   
   ### Check List (For Reviewer who merge this PR)
   
   - [ ] Confirm the release note
   - [ ] Confirm test cases
   - [ ] Confirm document
   - [ ] Add branch pick label <!-- Add branch pick label that this PR should 
merge into -->
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to