Baymine opened a new pull request, #50547: URL: https://github.com/apache/doris/pull/50547
### What problem does this PR solve? The previous lexer rule for `BRACKETED_COMMENT` in `DorisLexer.g4` incorrectly handled unclosed comments (comments starting with `/*` but not ending with `*/` before the end of the input). When an unclosed comment occurred, the rule's alternative ending condition (`| {markUnclosedComment();} EOF`) allowed the rule to **successfully match** by consuming all subsequent characters up to the end of the file (EOF) into a single `BRACKETED_COMMENT` token. This effectively hid valid SQL code following the unclosed comment start from the parser. For example, consider the following SQL: ```sql select * from t1 /*this is /*a comment*/ limit 1; ``` Under the old rule: 1. The lexer started matching `BRACKETED_COMMENT` at the first `/*`. 2. It recursively handled the inner `/*a comment*/`. 3. It continued trying to find the closing `*/` for the *outer* comment. 4. Upon reaching EOF, it matched the `| {markUnclosedComment();} EOF` condition. 5. The *entire* string `/*this is /*a comment*/ limit 1;` was consumed as **one** `BRACKETED_COMMENT` token. 6. The parser only saw `SELECT * FROM t1`, leading to the incorrect result of returning all rows instead of just one. Problem Summary: The primary goal of this change is to correctly support **unclosed nested bracketed comments** (e.g., `/* outer /* inner */`) within SQL statements parsed by the Doris Nereids lexer. This aligns with standard SQL practices, including MySQL, which allows such nesting. ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [x] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> ```sql -- setting CREATE TABLE IF NOT EXISTS t1 (id INT) DISTRIBUTED BY HASH(id) BUCKETS 1 PROPERTIES ('replication_num' = '1'); INSERT INTO t1 VALUES (1), (2), (3); -- Test Case : Unclosed Comment (Original Bug Case - Nested) select * from t1 /* outer /* inner */ limit 1; -- EXPECTED: return 1 row -- OBSERVED: return 3 rows -- Test Case : Valid query using table after nested comment select * from t1 /* outer /* inner */ limit 1; -- EXPECTED: Returns 1 row. -- OBSERVED: Returns 1 row. ``` - Behavior changed: - [ ] No. - [] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [x] No. - [ ] Yes. <!-- Add document PR link here. eg: https://github.com/apache/doris-website/pull/1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org