[ https://issues.apache.org/jira/browse/LUCENE-10010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17367488#comment-17367488 ]
Michael McCandless commented on LUCENE-10010: --------------------------------------------- {quote}NFA traversal is not necessarily slow - tracking multiple paths can slow it down a bit but typically it's still fast. {quote} I think there is a real tradeoff here, since this new NFA based RegexpQuery would need to do the powerset expansion/tracking as it intersects each term's character. For "easy" cases, e.g. where the NFA is nearly already a DFA and each powerset typically has only one NFA state to track, it's hopefully pretty close to DFA performance. For "adversarial" cases, which today Lucene punts on (throws {{TooComplexToDeterminizeException}}), the cost is likely rather high, e.g. tracking 10s of NFA states in each DFA state. But that's fine! Previously Lucene could not even handle such tricky regexps, and with this change, it will :) But we can test if that really matters in practice, on "real world" regexps/indices, once we build this NFA RegExp option > Should we have a NFA Query? > --------------------------- > > Key: LUCENE-10010 > URL: https://issues.apache.org/jira/browse/LUCENE-10010 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search > Affects Versions: main (9.0) > Reporter: Haoyu Zhai > Priority: Major > > Today when a {{RegexpQuery}} is created, it will be translated to NFA, > determinized to DFA and eventually become an {{AutomatonQuery}}, which is > very fast. However, not every NFA could be determinized to DFA easily, the > example given in LUCENE-9981 showed how easy could a short regexp break the > determinize process. > Maybe, instead of marking those kind of queries as adversarial cases, we > could make a new kind of NFA query, which execute directly on NFA and thus no > need to worry about determinize process or determinized DFA size. It should > be slower, but also makes those adversarial cases doable. > [This article|https://swtch.com/~rsc/regexp/regexp1.html] has provided a > simple but efficient way of searching over NFA, essentially it is a partial > determinize process that only determinize the necessary part of DFA. Maybe we > could give it a try? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org