[ https://issues.apache.org/jira/browse/LUCENE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir resolved LUCENE-10296. ---------------------------------- Fix Version/s: 10.0 (main) Resolution: Fixed > Stop minimizing regexps > ----------------------- > > Key: LUCENE-10296 > URL: https://issues.apache.org/jira/browse/LUCENE-10296 > Project: Lucene - Core > Issue Type: Task > Affects Versions: 10.0 (main) > Reporter: Robert Muir > Priority: Major > Fix For: 10.0 (main) > > Time Spent: 40m > Remaining Estimate: 0h > > In current trunk, we let caller (e.g. RegExpQuery) try to "reduce" the > expression. The parser nor the low-level executors don't implicitly call > exponential-time algorithms anymore. > But now that we have cleaned this up, we can see it is even worse than just > calling {{{}determinize(){}}}. We still call {{minimize()}} which is much > crazier and much more. > We stopped doing this for all other AutomatonQuery subclasses a long time > ago, as we determined that it didn't help performance. Additionally, > minimization vs. determinization is even less important than early days where > we found trouble: the representation got a lot better. Today when you > {{finishState}} we do a lot of practical sorting/coalescing on-the-fly. Also > we added this fancy UTF32-to-UTF8 automata convertor, that makes the > worst-case-space-per-state significantly lower than it was before? So why > {{minimize()}} ? > Let's just replace {{minimize()}} calls with {{determinize()}} calls? I've > already swapped them out for all of {{{}src/test{}}}, to get jenkins looking > for issues ahead of time. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org