rmuir commented on pull request #528:
URL: https://github.com/apache/lucene/pull/528#issuecomment-989454186


   LUCENE-10296: Stop minimizing regexps
   
   In current trunk, we let caller (e.g. `RegExpQuery`) try to "reduce" the 
expression. The parser nor the low-level executors don't implicitly call 
exponential-time algorithms anymore.
   
   But now that we have cleaned this up, we can see, what is happening is even 
worse than just calling `determinize()`. We still call `minimize()` which is 
much crazier and much more.
   
   We stopped doing this for all other `AutomatonQuery` subclasses a long time 
ago, as we determined that it didn't help performance. Additionally, 
minimization vs. determinization is even less important than early days where 
we found trouble: the representation got a lot better. Today when you 
`finishState()` we do a lot of practical sorting/coalescing on-the-fly. The 
practical parts of minimization for runtime perf. Also we added our fancy 
UTF32-to-UTF8 automata convertor, that makes the worst-case-space-per-state 
significantly lower than with UTF-16 representation? So why minimize() ?
   
   Let's just replace `minimize()` calls with `determinize()` calls? I've 
already swapped them out for all of `src/test`, to get jenkins looking for 
issues ahead of time.
   
   This change moves Hopcroft minimization (MinimizeOperations) to src/test for 
now. I'd like to explore nuking it from there as a next step, any tests that 
truly need minimization should be fine with brzozowski's algorithm: that's a 
2-liner.
   
   I think the problem is understood, longs are insane for docids, I don't wish 
to hold changes up on stupid stuff....


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to