rmuir opened a new pull request, #14227: URL: https://github.com/apache/lucene/pull/14227
string: `?+½]+]+Ř*+[\]ᖴﴁ.` expected: before #14193 ``` java.lang.IllegalArgumentException: expected ']' at position 17 ``` actual: after #14193 ``` REGEXP_CONCATENATION REGEXP_CONCATENATION REGEXP_CONCATENATION REGEXP_CONCATENATION REGEXP_CONCATENATION REGEXP_CONCATENATION REGEXP_CONCATENATION REGEXP_CONCATENATION REGEXP_REPEAT_MIN min=1 REGEXP_CHAR char=? REGEXP_CHAR char=½ REGEXP_REPEAT_MIN min=1 REGEXP_CHAR char=] REGEXP_CHAR char= REGEXP_REPEAT_MIN min=1 REGEXP_CHAR char=] REGEXP_REPEAT_MIN min=1 REGEXP_REPEAT REGEXP_CHAR char=Ř REGEXP_CHAR_CLASS starts=[] ends=[] REGEXP_STRING string=ᖴﴁ REGEXP_ANYCHAR ``` Problem is caused by RegExp accepting too much rather than throwing exceptions like it should have. The lenience in the parser comes from `expandPreDefined()` which invades on escape character parsing for character classes (e.g. `\s`). This one adds a lot of complexity to parsing. Don't invoke expandPreDefined(), except for the set of characters that it explicitly handles. This is also consistent with the way expandPreDefined()'s complexity is managed elsewhere in the parser, such as in `parseSimpleExp()`. Add parsing tests for `testEmptyClass()`, which is unchanged by this PR, but should be there, and `testEscapedInvalidClass()`, which fails without the change. Closes #14224 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org