rmuir opened a new pull request #513:
URL: https://github.com/apache/lucene/pull/513


   Today, RegExp calls minimize() at every parsing step. There is little point 
to making an NFA execution when it is doing this.
   
   Moreover, some minimize() calls are missing, and in fact in rare cases it 
can already return an NFA today (for certain syntax)
   
   Instead, RegExp parsing should do none of this, instead it may return a DFA 
or NFA. NOTE: many simple regexps happen to be still returned as DFA, just 
because of the algorithms in use.
   
   Callers can decide whether to determinize or minimize. RegExp parsing should 
not run in exponential time.
   
   All src/java callsites were modified to call minimize(), to prevent any 
performance problems. minimize() seems unnecessary, but let's approach removing 
minimization as a separate PR. src/test was fixed to just use determinize() in 
preparation for this.
   
   This PR is very similar to #485 but just for the regexp parsing. 
   
   One annoying change is we can't support true complement operator (~), which 
is an obscure non-default operator (must be enabled with flag). negated 
character classes (e.g. `[^a-z]`) still work as they are contained. But 
complementing arbitrary automata would require exponential time!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to