rmuir opened a new pull request #513: URL: https://github.com/apache/lucene/pull/513
Today, RegExp calls minimize() at every parsing step. There is little point to making an NFA execution when it is doing this. Moreover, some minimize() calls are missing, and in fact in rare cases it can already return an NFA today (for certain syntax) Instead, RegExp parsing should do none of this, instead it may return a DFA or NFA. NOTE: many simple regexps happen to be still returned as DFA, just because of the algorithms in use. Callers can decide whether to determinize or minimize. RegExp parsing should not run in exponential time. All src/java callsites were modified to call minimize(), to prevent any performance problems. minimize() seems unnecessary, but let's approach removing minimization as a separate PR. src/test was fixed to just use determinize() in preparation for this. This PR is very similar to #485 but just for the regexp parsing. One annoying change is we can't support true complement operator (~), which is an obscure non-default operator (must be enabled with flag). negated character classes (e.g. `[^a-z]`) still work as they are contained. But complementing arbitrary automata would require exponential time! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org