[ 
https://issues.apache.org/jira/browse/LUCENE-10642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andriy Redko updated LUCENE-10642:
----------------------------------
    Description: 
Interesting issue has been reported to Opensearch project [1], which has been 
caused by [2], [3]. In the nutshell, the regression is causing escape sequences 
(like \n, \r, \t, ...) to be treated as character classes (specifically, 
[https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#bs).]

 

The problematic function is RegExp::matchPredefinedCharacterClass which does 
not consider characters that denote an escaped construct. Simple test to 
reproduce which fails with IllegalArgumentException("{color:#0451a5}invalid 
character class{color}"):

 
{noformat}
public class TestRegexpQuery extends LuceneTestCase {
  public void testEscapeSequences() throws IOException {           
    assertEquals(1, regexQueryNrHits("\\n"));           
    assertEquals(1, regexQueryNrHits("[\\n]"));   }
  }
}
  {noformat}
 

[1] [https://github.com/opensearch-project/OpenSearch/issues/3781]
[2] 
[https://github.com/apache/lucene/commit/1efce5444dd40142c55c5a3a30eeebc7b86796c3]
[3] 
[https://github.com/apache/lucene/commit/819e668ce2fcfcf86b652a191cdbe0fad0a8ffce]

  was:
Interesting issue has been reported to Opensearch project [1], which has been 
caused by [2], [3]. In the nutshell, the regression is causing escape sequences 
(like \n, \r, \t, ...) to be treated as character classes (specifically, 
[https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#bs).]

 

The problematic function is RegExp::matchPredefinedCharacterClass which does 
not consider characters that denote an escaped construct.

 

Simple test to reproduce which fails with 
IllegalArgumentException("{color:#0451a5}invalid character class{color}"):

 

```

public class TestRegexpQuery extends LuceneTestCase {

  public void testEscapeSequences() throws IOException{    
      assertEquals(1, regexQueryNrHits("\\n"));     
      assertEquals(1, regexQueryNrHits("[\\n]"));   }
  }
}

```

 

[1] [https://github.com/opensearch-project/OpenSearch/issues/3781]
[2] 
[https://github.com/apache/lucene/commit/1efce5444dd40142c55c5a3a30eeebc7b86796c3]
[3] 
[https://github.com/apache/lucene/commit/819e668ce2fcfcf86b652a191cdbe0fad0a8ffce]


> Regexp query: escape sequences are treated as character classes
> ---------------------------------------------------------------
>
>                 Key: LUCENE-10642
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10642
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 9.0, 9.1, 9.2, 9.3
>            Reporter: Andriy Redko
>            Priority: Major
>
> Interesting issue has been reported to Opensearch project [1], which has been 
> caused by [2], [3]. In the nutshell, the regression is causing escape 
> sequences (like \n, \r, \t, ...) to be treated as character classes 
> (specifically, 
> [https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#bs).]
>  
> The problematic function is RegExp::matchPredefinedCharacterClass which does 
> not consider characters that denote an escaped construct. Simple test to 
> reproduce which fails with IllegalArgumentException("{color:#0451a5}invalid 
> character class{color}"):
>  
> {noformat}
> public class TestRegexpQuery extends LuceneTestCase {
>   public void testEscapeSequences() throws IOException {           
>     assertEquals(1, regexQueryNrHits("\\n"));           
>     assertEquals(1, regexQueryNrHits("[\\n]"));   }
>   }
> }
>   {noformat}
>  
> [1] [https://github.com/opensearch-project/OpenSearch/issues/3781]
> [2] 
> [https://github.com/apache/lucene/commit/1efce5444dd40142c55c5a3a30eeebc7b86796c3]
> [3] 
> [https://github.com/apache/lucene/commit/819e668ce2fcfcf86b652a191cdbe0fad0a8ffce]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to