msfroh commented on PR #14350: URL: https://github.com/apache/lucene/pull/14350#issuecomment-2722961519
To the best of my understanding from reading the through the code while sketching this PR, I believe it would produce a minimal DFA if every character in a set of alternatives in the input strings have the same canonical representation. (The existing implementation already throws if input is not sorted BytesRefs.) That is, if you input `cap, cat, cats, cob`, it will generate the minimal DFA. If you input `CAP, CAT, CATS, COB`, you'll end up with the same minimal DFA (albeit with the transitions added in the opposite order, which I think is fine). But if you input `CAP, CATS, cat, cob`, you'll end up with a NFA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org