[
https://issues.apache.org/jira/browse/LUCENE-10642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andriy Redko updated LUCENE-10642:
----------------------------------
Description:
Interesting issue has been reported to Opensearch project [1], which has been
caused by [2], [3]. In the nutshell, the regression is causing escape sequences
(like \n, \r, \t, ...) to be treated as character classes (specifically,
[https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#bs).]
The problematic function is RegExp::matchPredefinedCharacterClass which does
not consider characters that denote an escaped construct. Simple test to
reproduce which fails with IllegalArgumentException("{color:#0451a5}invalid
character class{color}"):
{noformat}
public class TestRegexpQuery extends LuceneTestCase {
public void testEscapeSequences() throws IOException {
assertEquals(1, regexQueryNrHits("\\n"));
assertEquals(1, regexQueryNrHits("[\\n]")); }
}
}
{noformat}
[1] [https://github.com/opensearch-project/OpenSearch/issues/3781]
[2]
[https://github.com/apache/lucene/commit/1efce5444dd40142c55c5a3a30eeebc7b86796c3]
[3]
[https://github.com/apache/lucene/commit/819e668ce2fcfcf86b652a191cdbe0fad0a8ffce]
was:
Interesting issue has been reported to Opensearch project [1], which has been
caused by [2], [3]. In the nutshell, the regression is causing escape sequences
(like \n, \r, \t, ...) to be treated as character classes (specifically,
[https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#bs).]
The problematic function is RegExp::matchPredefinedCharacterClass which does
not consider characters that denote an escaped construct. Simple test to
reproduce which fails with IllegalArgumentException("{color:#0451a5}invalid
character class{color}"):
{noformat}
public class TestRegexpQuery extends LuceneTestCase {
public void testEscapeSequences() throws IOException {
assertEquals(1, regexQueryNrHits("\\n"));
assertEquals(1, regexQueryNrHits("[\\n]")); }
}
}
{noformat}
[1] [https://github.com/opensearch-project/OpenSearch/issues/3781]
[2]
[https://github.com/apache/lucene/commit/1efce5444dd40142c55c5a3a30eeebc7b86796c3]
[3]
[https://github.com/apache/lucene/commit/819e668ce2fcfcf86b652a191cdbe0fad0a8ffce]
> Regexp query: escape sequences are treated as character classes
> ---------------------------------------------------------------
>
> Key: LUCENE-10642
> URL: https://issues.apache.org/jira/browse/LUCENE-10642
> Project: Lucene - Core
> Issue Type: Bug
> Affects Versions: 9.0, 9.1, 9.2, 9.3
> Reporter: Andriy Redko
> Priority: Major
>
> Interesting issue has been reported to Opensearch project [1], which has been
> caused by [2], [3]. In the nutshell, the regression is causing escape
> sequences (like \n, \r, \t, ...) to be treated as character classes
> (specifically,
> [https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#bs).]
> The problematic function is RegExp::matchPredefinedCharacterClass which does
> not consider characters that denote an escaped construct. Simple test to
> reproduce which fails with IllegalArgumentException("{color:#0451a5}invalid
> character class{color}"):
>
> {noformat}
> public class TestRegexpQuery extends LuceneTestCase {
> public void testEscapeSequences() throws IOException {
> assertEquals(1, regexQueryNrHits("\\n"));
> assertEquals(1, regexQueryNrHits("[\\n]")); }
> }
> }
> {noformat}
>
> [1] [https://github.com/opensearch-project/OpenSearch/issues/3781]
> [2]
> [https://github.com/apache/lucene/commit/1efce5444dd40142c55c5a3a30eeebc7b86796c3]
> [3]
> [https://github.com/apache/lucene/commit/819e668ce2fcfcf86b652a191cdbe0fad0a8ffce]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]