[ 
https://issues.apache.org/jira/browse/LUCENE-10010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17453260#comment-17453260
 ] 

ASF subversion and git services commented on LUCENE-10010:
----------------------------------------------------------

Commit b2e866b70366c5623e4208898dfa22afdace8d99 in lucene's branch 
refs/heads/main from Robert Muir
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=b2e866b ]

LUCENE-10010: don't determinize in CompiledAutomaton/RunAutomaton (#485)

Instead, require that incoming automata is determinized by the caller, throwing 
an exception if it isn't.

This paves the way for NFA execution in the future: if you pass an NFA to 
AutomatonQuery, we should use the NFA algorithm on it. No need for lots of 
booleans or enums.

In the meantime, it cleans up plenty of APIs by not having to plumb 
`determinizeWorkLimit` parameters down to the guts.

> Should we have a NFA Query?
> ---------------------------
>
>                 Key: LUCENE-10010
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10010
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/search
>    Affects Versions: 9.0
>            Reporter: Haoyu Zhai
>            Priority: Major
>          Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> Today when a {{RegexpQuery}} is created, it will be translated to NFA, 
> determinized to DFA and eventually become an {{AutomatonQuery}}, which is 
> very fast. However, not every NFA could be determinized to DFA easily, the 
> example given in LUCENE-9981 showed how easy could a short regexp break the 
> determinize process.
> Maybe, instead of marking those kind of queries as adversarial cases, we 
> could make a new kind of NFA query, which execute directly on NFA and thus no 
> need to worry about determinize process or determinized DFA size. It should 
> be slower, but also makes those adversarial cases doable.
> [This article|https://swtch.com/~rsc/regexp/regexp1.html] has provided a 
> simple but efficient way of searching over NFA, essentially it is a partial 
> determinize process that only determinize the necessary part of DFA. Maybe we 
> could give it a try?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to