Re: [PR] Terminate automaton after matched the whole prefix for PrefixQuery. [lucene]

via GitHub Mon, 15 Apr 2024 02:16:17 -0700


vsop-479 commented on code in PR #13072:
URL: https://github.com/apache/lucene/pull/13072#discussion_r1565442746



##########
lucene/core/src/java/org/apache/lucene/util/automaton/RunAutomaton.java:
##########
@@ -96,6 +101,35 @@ protected RunAutomaton(Automaton a, int alphabetSize) {
     }
   }
 
+  /** Returns true if this state can accept everything(all remaining 
suffixes). */
+  private boolean canMatchAllSuffix(int state) {
+    assert automaton.isAccept(state);
+    int numTransitions = automaton.getNumTransitions(state);
+    // Apply to PrefixQuery, TermRangeQuery.
+    if (numTransitions == 1) {

Review Comment:
   I think I can detecting a match all suffix state for `Regexp/WildcardQuery`, 
in `UTF32ToUTF8.convert` after `convertOneEdge` like this:
   ````
   // Writes new transitions into pendingTransitions:
   convertOneEdge(utf8State, destUTF8, scratch.min, scratch.max);
           
   // Set match all suffix state.
   if(scratch.min == 0 && scratch.max == 1114111 && utf8.isAccept(utf8State) && 
utf8.isAccept(destUTF8)){
     utf8.setMatchAllSuffix(utf8State, true);
   }
   ````
   Which is simple and reliable, but will violate the rule below:
   
   > Everything else about Automaton today is fundamental (states, transitions, 
isAccept) and necessary, but this new member is more a best effort optimization?
   
   Also: I can check whether a candidate state can finally ended on an accept 
by [128, 191], which is added in `UTF32ToUTF8.adll`, in `RunAutomaton`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Terminate automaton after matched the whole prefix for PrefixQuery. [lucene]

Reply via email to