Robert Muir created LUCENE-10296: ------------------------------------ Summary: Stop minimizing regexps Key: LUCENE-10296 URL: https://issues.apache.org/jira/browse/LUCENE-10296 Project: Lucene - Core Issue Type: Task Affects Versions: 10.0 (main) Reporter: Robert Muir
In current trunk, we let caller (e.g. RegExpQuery) try to "reduce" the expression. The parser nor the low-level executors don't implicitly call exponential-time algorithms anymore. But now that we have cleaned this up, we can see it is even worse than just calling {{{}determinize(){}}}. We still call {{minimize()}} which is much crazier and much more. We stopped doing this for all other AutomatonQuery subclasses a long time ago, as we determined that it didn't help performance. Additionally, minimization vs. determinization is even less important than early days where we found trouble: the representation got a lot better. Today when you {{finishState}} we do a lot of practical sorting/coalescing on-the-fly. Also we added this fancy UTF32-to-UTF8 automata convertor, that makes the worst-case-space-per-state significantly lower than it was before? So why {{minimize()}} ? Let's just replace {{minimize()}} calls with {{determinize()}} calls? I've already swapped them out for all of {{{}src/test{}}}, to get jenkins looking for issues ahead of time. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org