rmuir commented on pull request #528: URL: https://github.com/apache/lucene/pull/528#issuecomment-989454186
LUCENE-10296: Stop minimizing regexps In current trunk, we let caller (e.g. `RegExpQuery`) try to "reduce" the expression. The parser nor the low-level executors don't implicitly call exponential-time algorithms anymore. But now that we have cleaned this up, we can see, what is happening is even worse than just calling `determinize()`. We still call `minimize()` which is much crazier and much more. We stopped doing this for all other `AutomatonQuery` subclasses a long time ago, as we determined that it didn't help performance. Additionally, minimization vs. determinization is even less important than early days where we found trouble: the representation got a lot better. Today when you `finishState()` we do a lot of practical sorting/coalescing on-the-fly. The practical parts of minimization for runtime perf. Also we added our fancy UTF32-to-UTF8 automata convertor, that makes the worst-case-space-per-state significantly lower than with UTF-16 representation? So why minimize() ? Let's just replace `minimize()` calls with `determinize()` calls? I've already swapped them out for all of `src/test`, to get jenkins looking for issues ahead of time. This change moves Hopcroft minimization (MinimizeOperations) to src/test for now. I'd like to explore nuking it from there as a next step, any tests that truly need minimization should be fine with brzozowski's algorithm: that's a 2-liner. I think the problem is understood, longs are insane for docids, I don't wish to hold changes up on stupid stuff.... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org