[ https://issues.apache.org/jira/browse/LUCENE-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272129#comment-17272129 ]
Michael McCandless commented on LUCENE-9696: -------------------------------------------- Thanks [~gus]! We could separately consider adding group support to FSTs and to Lucene's {{Automaton}} classes, which are two separate implementations of fun finite-state algorithms. > RegExp with group references > ---------------------------- > > Key: LUCENE-9696 > URL: https://issues.apache.org/jira/browse/LUCENE-9696 > Project: Lucene - Core > Issue Type: Wish > Reporter: Gus Heck > Priority: Minor > > PatternTypingFilter presently relies on java util regexes, but LUCENE-7465 > found performance benefits using our own RegExp class instead. Unfortunately > RegExp does not currently report matching subgroups which is key to > PatternTypingFilter's use (and probably useful in other endeavors as well). > What's needed is reporting of sub-groups such that > new RegExp("(foo(.+)")) -->> converted to run atomaton etc --> match found > for "foobar" --> somehow reports getGroup(1) as "bar" > And getGroup() can be called on some object reasonably accessible to the code > using RegExp in the first place. > Clearly there's a lot to be worked out there since the normal usage pattern > converts things to a DFA / run Automaton etc, and subgroups are not a natural > concept for those classes. But if this could be achieved without loosing the > performance benefits, that would be interesting :). > Opening this Wish ticket as encouraged by [~mikemccand] in LUCENE-9575. I > won't be able to work on it any time soon to encourage anyone else interested > to pick it up or to drop links or ideas in here. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org