Ryan19929 opened a new pull request, #49519: URL: https://github.com/apache/doris/pull/49519
### What problem does this PR solve? Issue Number: close #35411 Related PR: #xxx Problem Summary: This pull request adds support for the IK Tokenizer to the project. Key changes include migrating the ik tokenizer from Java to C++, updating the inverted index parser to work with IK, and adding new test cases for the IK analyzer. ## IK Integration: - `be/CMakeLists.txt`: Added installation of IK dict files to the output directory. ## Inverted Index Parser Updates: - `be/src/olap/inverted_index_parser.cpp`: Added support for the PARSER_IK type in the `inverted_index_parser_type_to_string`, `get_inverted_index_parser_type_from_string` and `get_parser_mode_string_from_properties` functions. - `be/src/olap/inverted_index_parser.h`: Defined PARSER_IK in the InvertedIndexParserType enum and added the corresponding string constant. - `be/src/olap/rowset/segment_v2/inverted_index/analyzer/analyzer.cpp`: Included the IKAnalyzer header and updated the create_analyzer function to handle the PARSER_IK type. - `be/src/vec/functions/function_tokenize.cpp`: Update error message. - `fe/fe-core/src/main/java/org/apache/doris/analysis/InvertedIndexUtil.java`: Added support for the IK analyzer. ## Test Cases - `regression-test/suites/inverted_index_p0/test_ik_analyzer.groovy`: Added a new test suite for the IK analyzer, including table creation, data insertion, and query validation. - `regression-test/suites/inverted_index_p0/test_tokenize.groovy`: Added test cases for tokenizing text using the IK parser. - `regression-test/data/inverted_index_p0/test_ik_analyzer.out`: Added expected output for the IK analyzer test cases. - `regression-test/data/inverted_index_p0/test_tokenize.out`: Added expected output for the tokenization test cases using the IK parser. ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [X] Regression test - [X] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [X] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [X] Yes. <!-- Add document PR link here. eg: https://github.com/apache/doris-website/pull/1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org