Andriy Redko created LUCENE-10642: ------------------------------------- Summary: Regexp query: escape sequences are treated as character classes Key: LUCENE-10642 URL: https://issues.apache.org/jira/browse/LUCENE-10642 Project: Lucene - Core Issue Type: Bug Affects Versions: 9.0 Reporter: Andriy Redko
Interesting issue has been reported to Opensearch project [1], which has been caused by [2], [3]. In the nutshell, the regression is causing escape sequences (like \n, \r, \t, ...) to be treated as character classes (specifically, [https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#bs).] The problematic function is RegExp::matchPredefinedCharacterClass which does not consider characters that denote an escaped construct. Simple test to reproduce which fails with IllegalArgumentException("{color:#0451a5}invalid character class{color}"): ``` public class TestRegexpQuery extends LuceneTestCase { public void testEscapeSequences() throws IOException { assertEquals(1, regexQueryNrHits("\\n")); assertEquals(1, regexQueryNrHits("[\\n]")); } } ``` [1] [https://github.com/opensearch-project/OpenSearch/issues/3781] [2] https://github.com/apache/lucene/commit/1efce5444dd40142c55c5a3a30eeebc7b86796c3 [3] https://github.com/apache/lucene/commit/819e668ce2fcfcf86b652a191cdbe0fad0a8ffce -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org