[ https://issues.apache.org/jira/browse/LUCENE-9370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17107169#comment-17107169 ]
Jim Ferenczi commented on LUCENE-9370: -------------------------------------- +1 > RegExpQuery should error for inappropriate use of \ character in input > ---------------------------------------------------------------------- > > Key: LUCENE-9370 > URL: https://issues.apache.org/jira/browse/LUCENE-9370 > Project: Lucene - Core > Issue Type: Bug > Components: core/search > Affects Versions: master (9.0) > Reporter: Mark Harwood > Priority: Minor > > The RegExp class is too lenient in parsing user input which can confuse or > mislead users and cause backwards compatibility issues as we enhance regex > support. > In normal regular expression syntax the backslash is used to: > * escape a reserved character like \. > * use certain unreserved characters in a shorthand context e.g. \d means > digits [0-9] > > The leniency bug in RegExp is that it adds an extra rule to this list - any > backslashed characters that don't satisfy the above rules are taken > literally. For example, there's no reason to put a backslash in front of the > letter "p" but we accept \p as the letter p. > Java's Pattern class will throw a parse exception given a meaningless > backslash like \p. > We should too. > In [Lucene-9336|https://issues.apache.org/jira/browse/LUCENE-9336] we added > support for commonly supported regex expressions like `\d`. Sadly this is a > breaking change because of the leniency that has allowed \d to be accepted as > the letter d without an exception. Users were likely silently missing results > they were hoping for and we made a BWC problem for ourselves in filling in > the gaps. > I propose we do like other RegEx parsers and error on inappropriate use of > backslashes. > This will be another breaking change so should target 9.0 -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org