[ 
https://issues.apache.org/jira/browse/GEODE-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinwoo Hwang updated GEODE-10508:
---------------------------------
    Fix Version/s: 2.1.0

> Remediation of ANTLR Nondeterminism Warnings in OQL Grammar
> -----------------------------------------------------------
>
>                 Key: GEODE-10508
>                 URL: https://issues.apache.org/jira/browse/GEODE-10508
>             Project: Geode
>          Issue Type: Improvement
>            Reporter: Jinwoo Hwang
>            Assignee: Jinwoo Hwang
>            Priority: Major
>             Fix For: 2.1.0
>
>
> h2. Summary
> Fix nondeterminism warnings generated by ANTLR during compilation of the OQL 
> (Object Query Language) grammar file. The warnings indicate ambiguity in the 
> parser that prevents it from deterministically choosing between alternative 
> production rules.
> h2. Problem Statement
> h3. Current Behavior
> During the {{generateGrammarSource}} task, ANTLR produces the following 
> nondeterminism warnings:
> {noformat}
> > Task :geode-core:generateGrammarSource
> /home/runner/work/geode/geode/geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g:574:40:
>  
> warning:nondeterminism between alts 1 and 2 of block upon
>     k==1:"sum","avg","min","max","count"
>     k==2:TOK_LPAREN
> /home/runner/work/geode/geode/geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g:578:13:
>  
> warning:nondeterminism between alts 1 and 2 of block upon
>     k==1:"sum","avg","min","max","count"
>     k==2:TOK_LPAREN
> /home/runner/work/geode/geode/geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g:961:26:
>  
> warning:nondeterminism between alts 1 and 2 of block upon
>     k==1:"distinct"
>     k==2:TOK_LPAREN,TOK_PLUS,TOK_MINUS,TOK_DOLLAR,...
> /home/runner/work/geode/geode/geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g:979:17:
>  
> warning:nondeterminism between alts 1 and 2 of block upon
>     k==1:"distinct"
>     k==2:TOK_LPAREN,TOK_PLUS,TOK_MINUS,TOK_DOLLAR,...
> {noformat}
> h3. Root Cause Analysis
> *Lines 574 & 578 (projection rule):*
> The parser cannot distinguish between {{aggregateExpr}} and {{expr}} 
> alternatives when it encounters aggregate function keywords ({{{}sum{}}}, 
> {{{}avg{}}}, {{{}min{}}}, {{{}max{}}}, {{{}count{}}}) because:
>  * These keywords can start an aggregate expression like {{sum(field)}}
>  * The same keywords can also be used as identifiers in regular expressions
>  * Without lookahead, the parser cannot determine which production rule to 
> apply
> *Lines 961 & 979 (aggregateExpr rule):*
> The optional {{distinct}} keyword creates ambiguity:
>  * The parser cannot decide whether to match the optional {{distinct}} 
> keyword or skip it and proceed directly to the expression
>  * This is because both alternatives are valid, and the default behavior 
> doesn't specify preference
> h2. Proposed Solution
> h3. 1. Add Syntactic Predicates (Lines 574 & 578)
> Use ANTLR syntactic predicates to perform lookahead and resolve ambiguity in 
> the {{projection}} rule:
> {code}
> Unable to find source-code formatter for language: antlr. Available languages 
> are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, 
> groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, 
> perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, 
> yaml// Before:
> ( tok1:aggregateExpr{node = #tok1;} | tok2:expr{node = #tok2;})
> // After:
> ( (("sum"|"avg"|"min"|"max"|"count") TOK_LPAREN)=> tok1:aggregateExpr{node = 
> #tok1;} | tok2:expr{node = #tok2;})
> {code}
> The predicate {{(("sum"|"avg"|"min"|"max"|"count") TOK_LPAREN)=>}} tells the 
> parser:
>  * Look ahead to check if an aggregate keyword is followed by a left 
> parenthesis
>  * If true, choose the {{aggregateExpr}} alternative
>  * Otherwise, choose the {{expr}} alternative
> h3. 2. Add Greedy Option (Lines 961 & 979)
> Use the {{greedy}} option for optional {{distinct}} keywords in the 
> {{aggregateExpr}} rule:
> {code}
> Unable to find source-code formatter for language: antlr. Available languages 
> are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, 
> groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, 
> perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, 
> yaml// Before:
> TOK_LPAREN ("distinct"! {distinctOnly = true;} ) ? tokExpr1:expr TOK_RPAREN
> // After:
> TOK_LPAREN (options {greedy=true;}: "distinct"! {distinctOnly = true;} ) ? 
> tokExpr1:expr TOK_RPAREN
> {code}
> The {{greedy}} option instructs the parser to match the {{distinct}} keyword 
> whenever it appears, removing ambiguity.
> h3. 3. Fix Test Compatibility Issue
> The grammar changes will alter token numbering in the generated lexer. Update 
> {{AbstractCompiledValueTestJUnitTest.java}} to use constants instead of 
> hardcoded token values:
> {code:java}
> // Before:
> new CompiledJunction(new CompiledValue[] {compiledValue1, compiledValue2}, 89)
> // After:
> import org.apache.geode.cache.query.internal.parse.OQLLexerTokenTypes;
> ...
> new CompiledJunction(new CompiledValue[] {compiledValue1, compiledValue2}, 
> OQLLexerTokenTypes.LITERAL_or)
> {code}
> *Rationale:* Adding syntactic predicates changes token numbering (e.g., 
> {{LITERAL_or}} changes from 89 to 94). Using the constant ensures tests 
> remain correct regardless of future grammar modifications.
> h2. Implementation Plan
> h3. Phase 1: Grammar Modifications
>  # Update 
> {{geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g}}
>  # Add syntactic predicates to the {{projection}} rule (2 locations)
>  # Add {{greedy}} option to {{aggregateExpr}} rule (2 locations)
>  # Add inline comments explaining the reasoning for each change
> h3. Phase 2: Test Updates
>  # Update {{AbstractCompiledValueTestJUnitTest.java}}
>  # Add import for {{OQLLexerTokenTypes}}
>  # Replace hardcoded value {{89}} with {{OQLLexerTokenTypes.LITERAL_or}}
>  # Add comments explaining why the constant is used
> h3. Phase 3: Validation
>  # Run {{./gradlew :geode-core:generateGrammarSource}} to verify no warnings
>  # Run {{./gradlew :geode-core:test --tests 
> "org.apache.geode.cache.query.internal.parse.*"}} to verify parser tests pass
>  # Run {{./gradlew :geode-core:test --tests 
> "org.apache.geode.cache.query.internal.Abstract*"}} to verify updated test 
> passes
>  # Run full {{geode-core}} test suite to ensure no regressions
> h2. Expected Outcomes
> h3. Success Criteria
> (/) Zero nondeterminism warnings from ANTLR grammar generation
> (/) All existing query parsing tests pass
> (/) {{AbstractCompiledValueTestJUnitTest}} passes with updated token usage
> (/) No behavior changes to OQL query parsing or execution
> (/) Token numbering is stable and predictable
> h3. Impact Assessment
> *Risk Level:* Low
>  * Changes only affect parser generation, not runtime behavior
>  * Syntactic predicates and greedy options are standard ANTLR features
>  * All test coverage validates correct parsing behavior
> *Performance Impact:* None
>  * Grammar changes only affect compile-time parser generation
>  * No runtime performance impact
> *Compatibility:* Maintained
>  * No changes to OQL syntax or semantics
>  * Fully backward compatible with existing queries
> h2. Files to be Modified
>  # 
> {{geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g}}
>  (Grammar file)
>  # 
> {{geode-core/src/test/java/org/apache/geode/cache/query/internal/AbstractCompiledValueTestJUnitTest.java}}
>  (Test file)
> h2. References
>  * [ANTLR 2 Documentation - Syntactic 
> Predicates|https://www.antlr2.org/doc/predicates.html]
>  * [ANTLR 2 Documentation - Greedy 
> Subrules|https://www.antlr2.org/doc/options.html]
>  * Apache Geode OQL Documentation
> h2. Testing Strategy
> h3. Unit Tests
>  * Verify all query parser tests pass
>  * Verify aggregate function parsing works correctly
>  * Verify distinct keyword handling in aggregate functions
> h3. Integration Tests
>  * Run OQL query tests with aggregate functions
>  * Test queries with projection containing aggregate expressions
>  * Test queries mixing aggregate and non-aggregate projections
> h3. Regression Tests
>  * Run full geode-core test suite
>  * Verify no changes to generated parser behavior
>  * Confirm token numbering stability



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to