Jinwoo Hwang created GEODE-10463:
------------------------------------
Summary: Fix lexical nondeterminism warning in OQL grammar between
ALL_UNICODE and DIGIT rules
Key: GEODE-10463
URL: https://issues.apache.org/jira/browse/GEODE-10463
Project: Geode
Issue Type: Improvement
Reporter: Jinwoo Hwang
The ANTLR grammar generation for the OQL (Object Query Language) parser
produces a lexical nondeterminism warning during the build process:
*
warning:lexical nondeterminism between alts 1 and 3 of block upon
k==1:'\u0660'..'\u0669','\u06f0'..'\u06f9','\u0966'..'\u096f',...
This warning occurs in the {{RegionNameCharacter}} lexer rule at line 155 of
[oql.g|vscode-file://vscode-app/Applications/Visual%20Studio%20Code.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html]
due to overlapping character ranges between the {{ALL_UNICODE}} and {{DIGIT}}
rules.
h3. Root Cause
The {{ALL_UNICODE}} rule defined as {{('\u0061'..'\ufffd')}} includes Unicode
digit ranges that are also explicitly defined in the {{DIGIT}} rule:
* Arabic-Indic digits ({{{}\u0660-\u0669{}}})
* Extended Arabic-Indic digits ({{{}\u06f0-\u06f9{}}})
* Devanagari, Bengali, Gurmukhi, Gujarati, and other Unicode digit ranges
When the lexer encounters these Unicode digits in region names, it cannot
deterministically choose between matching them as {{ALL_UNICODE}} or {{DIGIT}}
characters, creating lexical ambiguity.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)