Re: [PR] Terminate automaton after matched the whole prefix for PrefixQuery. [lucene]

via GitHub Thu, 07 Mar 2024 01:49:13 -0800


vsop-479 commented on code in PR #13072:
URL: https://github.com/apache/lucene/pull/13072#discussion_r1515861738



##########
lucene/core/src/java/org/apache/lucene/util/automaton/RunAutomaton.java:
##########
@@ -96,6 +101,35 @@ protected RunAutomaton(Automaton a, int alphabetSize) {
     }
   }
 
+  /** Returns true if this state can accept everything(all remaining 
suffixes). */
+  private boolean canMatchAllSuffix(int state) {
+    assert automaton.isAccept(state);
+    int numTransitions = automaton.getNumTransitions(state);
+    // Apply to PrefixQuery, TermRangeQuery.
+    if (numTransitions == 1) {

Review Comment:
   > we need to figure out why Regexp/WildcardQuery are compiling down to 127 
as their max on .* suffix transitions?
   
   These queries' (including `AutomatonQuery`)`Automaton` like this: [0, 127]: 
3 -> 3; [194, 194]: 3 -> 4; [128, 191]: 4 -> 3. assume 3 is an accept state. 
   It is more complex to detect wether a state can accept all remaining 
suffixes for these queries, because its accept states split many transitions 
like:[0, 127], [194, 223], [224, 239], [240, 243], [244], etc.
   
   I am still working on this, any suggestion is welcome @mikemccand.
   
   > Perhaps we could also add tests cases for custom Automata passed to 
AutomatonQuery matching sometimes binary (non-UTF8) terms?
   
   Added.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Terminate automaton after matched the whole prefix for PrefixQuery. [lucene]

Reply via email to