rmuir commented on PR #15232: URL: https://github.com/apache/lucene/pull/15232#issuecomment-3335658607
For some of these regexps, they may now come out deterministic or even minimal+deterministic to begin with at the parsing phase (this is a good thing!): we may have to get more creative with the regexps to force the determinize() to do actual work. In such a case, the `Operations.determinize()` is a no-op, which is why I think you see some of the crazy-fast numbers here. Best way to check out the regexes is to just write little throwaway unit-tests similar to: https://github.com/apache/lucene/blob/002094613418c4bc6a7e335a8edca82fd26ac03d/lucene/core/src/test/org/apache/lucene/util/automaton/TestRegExpParsing.java#L527-L533 Basically, if the result from `toAutomaton()` passes `assertCleanDFA()`, then you know it is already a DFA and determinize() wont do anything. See the assertions here: https://github.com/apache/lucene/blob/de1ed71261d579fdd3cf71b0734f30ea799c4b1f/lucene/test-framework/src/java/org/apache/lucene/tests/util/automaton/AutomatonTestUtil.java#L394-L413 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
