Andriy Redko created LUCENE-10642:
-------------------------------------

             Summary: Regexp query: escape sequences are treated as character 
classes
                 Key: LUCENE-10642
                 URL: https://issues.apache.org/jira/browse/LUCENE-10642
             Project: Lucene - Core
          Issue Type: Bug
    Affects Versions: 9.0
            Reporter: Andriy Redko


Interesting issue has been reported to Opensearch project [1], which has been 
caused by [2], [3]. In the nutshell, the regression is causing escape sequences 
(like \n, \r, \t, ...) to be treated as character classes (specifically, 
[https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#bs).]

 

The problematic function is RegExp::matchPredefinedCharacterClass which does 
not consider characters that denote an escaped construct.

 

Simple test to reproduce which fails with 
IllegalArgumentException("{color:#0451a5}invalid character class{color}"):

 

```

public class TestRegexpQuery extends LuceneTestCase {

  public void testEscapeSequences() throws IOException {
    assertEquals(1, regexQueryNrHits("\\n"));
    assertEquals(1, regexQueryNrHits("[\\n]"));
  }

}

```

 

[1] [https://github.com/opensearch-project/OpenSearch/issues/3781]
[2] 
https://github.com/apache/lucene/commit/1efce5444dd40142c55c5a3a30eeebc7b86796c3
[3] 
https://github.com/apache/lucene/commit/819e668ce2fcfcf86b652a191cdbe0fad0a8ffce



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to