rmuir opened a new pull request #528:
URL: https://github.com/apache/lucene/pull/528


   In current trunk, we let caller (e.g. RegExpQuery) try to "reduce" the 
expression. The parser nor the low-level executors don't implicitly call 
exponential-time algorithms anymore.
   
   But now that we have cleaned this up, we can see it is even worse than just 
calling determinize(). We still call minimize() which is much crazier and much 
more.
   
   We stopped doing this for all other AutomatonQuery subclasses a long time 
ago, as we determined that it didn't help performance. Additionally, 
minimization vs. determinization is even less important than early days where 
we found trouble: the representation got a lot better. Today when you 
finishState we do a lot of practical sorting/coalescing on-the-fly. Also we 
added this fancy UTF32-to-UTF8 automata convertor, that makes the 
worst-case-space-per-state significantly lower than it was before? So why 
minimize() ?
   
   Let's just replace minimize() calls with determinize() calls? I've already 
swapped them out for all of src/test, to get jenkins looking for issues ahead 
of time.
   
   This change moves hopcroft minimization (MinimizeOperations) to src/test for 
now. I'd like to explore nuking it from there as a next step, any tests that 
truly need minimization should be fine with brzozowski's
   algorithm.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to