rmuir opened a new pull request, #14227:
URL: https://github.com/apache/lucene/pull/14227

   string: `?+½]+]+Ř*+[\]ᖴﴁ.`
   
   expected: before #14193
   ```
   java.lang.IllegalArgumentException: expected ']' at position 17
   ```
   
   actual: after #14193
   ```
   REGEXP_CONCATENATION
     REGEXP_CONCATENATION
       REGEXP_CONCATENATION
         REGEXP_CONCATENATION
           REGEXP_CONCATENATION
             REGEXP_CONCATENATION
               REGEXP_CONCATENATION
                 REGEXP_CONCATENATION
                   REGEXP_REPEAT_MIN min=1
                     REGEXP_CHAR char=?
                   REGEXP_CHAR char=½
                 REGEXP_REPEAT_MIN min=1
                   REGEXP_CHAR char=]
               REGEXP_CHAR char=
             REGEXP_REPEAT_MIN min=1
               REGEXP_CHAR char=]
           REGEXP_REPEAT_MIN min=1
             REGEXP_REPEAT
               REGEXP_CHAR char=Ř
         REGEXP_CHAR_CLASS starts=[] ends=[]
       REGEXP_STRING string=ᖴﴁ
     REGEXP_ANYCHAR
   ```
   
   Problem is caused by RegExp accepting too much rather than throwing 
exceptions like it should have. The lenience in the parser comes from 
`expandPreDefined()` which invades on escape character parsing for character 
classes (e.g. `\s`). This one adds a lot of complexity to parsing.
   
   Don't invoke expandPreDefined(), except for the set of characters that it 
explicitly handles. This is also consistent with the way expandPreDefined()'s 
complexity is managed elsewhere in the parser, such as in `parseSimpleExp()`.
   
   Add parsing tests for `testEmptyClass()`, which is unchanged by this PR, but 
should be there, and `testEscapedInvalidClass()`, which fails without the 
change.
   
   Closes #14224 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to