[
https://issues.apache.org/jira/browse/GEODE-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18023516#comment-18023516
]
ASF subversion and git services commented on GEODE-10463:
---------------------------------------------------------
Commit dbdec41174b127d2304fdebba6b70f153e543081 in geode's branch
refs/heads/develop from Jinwoo Hwang
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=dbdec41174 ]
[GEODE-10463] Fix lexical nondeterminism warning in OQL grammar between
ALL_UNICODE and DIGIT rules (#7928)
* GEODE-10463: Fix lexical nondeterminism warning in OQL grammar between
ALL_UNICODE and DIGIT rules
Refactored ALL_UNICODE rule to exclude Unicode digit ranges that overlap
with DIGIT rule, eliminating lexical ambiguity in RegionNameCharacter.
The ALL_UNICODE range is now split into 15 non-overlapping segments that
exclude Arabic-Indic, Devanagari, Bengali, and other Unicode digit ranges.
This ensures deterministic tokenization where Unicode digits are always
matched by DIGIT rule while other Unicode characters use ALL_UNICODE.
* GEODE-10463: Add clarifying comment for ALL_UNICODE lexer rule
Add documentation comment to explain that the ALL_UNICODE character
class excludes Unicode digit ranges to prevent lexical nondeterminism
with the DIGIT rule in the OQL grammar lexer.
> Fix lexical nondeterminism warning in OQL grammar between ALL_UNICODE and
> DIGIT rules
> -------------------------------------------------------------------------------------
>
> Key: GEODE-10463
> URL: https://issues.apache.org/jira/browse/GEODE-10463
> Project: Geode
> Issue Type: Improvement
> Reporter: Jinwoo Hwang
> Assignee: Jinwoo Hwang
> Priority: Major
> Labels: pull-request-available
> Time Spent: 252h
> Remaining Estimate: 0h
>
> The ANTLR grammar generation for the OQL (Object Query Language) parser
> produces a lexical nondeterminism warning during the build process:
>
> * warning:lexical nondeterminism between alts 1 and 3 of block upon
> k==1:'\u0660'..'\u0669','\u06f0'..'\u06f9','\u0966'..'\u096f',...
> This warning occurs in the {{RegionNameCharacter}} lexer rule at line 155 of
> [oql.g|vscode-file://vscode-app/Applications/Visual%20Studio%20Code.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html]
> due to overlapping character ranges between the {{ALL_UNICODE}} and
> {{DIGIT}} rules.
> h3. Root Cause
> The {{ALL_UNICODE}} rule defined as {{('\u0061'..'\ufffd')}} includes Unicode
> digit ranges that are also explicitly defined in the {{DIGIT}} rule:
> * Arabic-Indic digits ({{{}\u0660-\u0669{}}})
> * Extended Arabic-Indic digits ({{{}\u06f0-\u06f9{}}})
> * Devanagari, Bengali, Gurmukhi, Gujarati, and other Unicode digit ranges
> When the lexer encounters these Unicode digits in region names, it cannot
> deterministically choose between matching them as {{ALL_UNICODE}} or
> {{DIGIT}} characters, creating lexical ambiguity.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)