Robert Muir created LUCENE-10296:
------------------------------------

             Summary: Stop minimizing regexps
                 Key: LUCENE-10296
                 URL: https://issues.apache.org/jira/browse/LUCENE-10296
             Project: Lucene - Core
          Issue Type: Task
    Affects Versions: 10.0 (main)
            Reporter: Robert Muir


In current trunk, we let caller (e.g. RegExpQuery) try to "reduce" the 
expression. The parser nor the low-level executors don't implicitly call 
exponential-time algorithms anymore.

But now that we have cleaned this up, we can see it is even worse than just 
calling {{{}determinize(){}}}. We still call {{minimize()}} which is much 
crazier and much more. 

We stopped doing this for all other AutomatonQuery subclasses a long time ago, 
as we determined that it didn't help performance. Additionally, minimization 
vs. determinization is even less important than early days where we found 
trouble: the representation got a lot better. Today when you {{finishState}} we 
do a lot of practical sorting/coalescing on-the-fly. Also we added this fancy 
UTF32-to-UTF8 automata convertor, that makes the worst-case-space-per-state 
significantly lower than it was before? So why {{minimize()}} ?

Let's just replace {{minimize()}} calls with {{determinize()}} calls? I've 
already swapped them out for all of {{{}src/test{}}}, to get jenkins looking 
for issues ahead of time.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to