[ 
https://issues.apache.org/jira/browse/LUCENE-10642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563382#comment-17563382
 ] 

Uwe Schindler edited comment on LUCENE-10642 at 7/6/22 6:36 PM:
----------------------------------------------------------------

bq. From the user perspective, is it non-intuitive why the character classes 
should be denoted with two slashes

That's only in Java code (the usual stupidness) and possibly JSON. The problem 
is if you write "\n" the java compiler creates a newline out of it and theres 
never a \n in the regular expression.

Actually it is a problem if you cant write {{\\n}} as this would be seen by 
parser as \n.


was (Author: thetaphi):
bq. From the user perspective, is it non-intuitive why the character classes 
should be denoted with two slashes

That's only in Java code (the usual stupidness) and possibly JSON. The problem 
is if you write "\n" the java compiler creates a newline out of it and theres 
never a \n in the regular expression.

Actually it is a problem if you cant write \\n as this would be seen by parser 
as \n.

> Regexp query: escape sequences are treated as character classes
> ---------------------------------------------------------------
>
>                 Key: LUCENE-10642
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10642
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 9.0, 9.1, 9.2, 9.3
>            Reporter: Andriy Redko
>            Priority: Major
>
> Interesting issue has been reported to Opensearch project [1], which has been 
> caused by [2], [3]. In the nutshell, the regression is causing escape 
> sequences (like \n, \r, \t, ...) to be treated as character classes 
> (specifically, 
> [https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#bs).]
> The problematic function is RegExp::matchPredefinedCharacterClass which does 
> not consider characters that denote an escaped construct. Simple test to 
> reproduce which fails with IllegalArgumentException("{color:#0451a5}invalid 
> character class{color}"):
>  
> {noformat}
> public class TestRegexpQuery extends LuceneTestCase {
>   public void testEscapeSequences() throws IOException {           
>     assertEquals(1, regexQueryNrHits("\\n"));           
>     assertEquals(1, regexQueryNrHits("[\\n]"));   }
>   }
> }
>   {noformat}
>  
> [1] [https://github.com/opensearch-project/OpenSearch/issues/3781]
> [2] 
> [https://github.com/apache/lucene/commit/1efce5444dd40142c55c5a3a30eeebc7b86796c3]
> [3] 
> [https://github.com/apache/lucene/commit/819e668ce2fcfcf86b652a191cdbe0fad0a8ffce]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to