msfroh commented on PR #14350:
URL: https://github.com/apache/lucene/pull/14350#issuecomment-2722961519

   To the best of my understanding from reading the through the code while 
sketching this PR, I believe it would produce a minimal DFA if every character 
in a set of alternatives in the input strings have the same canonical 
representation. (The existing implementation already throws if input is not 
sorted BytesRefs.)
   
   That is, if you input `cap, cat, cats, cob`, it will generate the minimal 
DFA. If you input `CAP, CAT, CATS, COB`, you'll end up with the same minimal 
DFA (albeit with the transitions added in the opposite order, which I think is 
fine). But if you input `CAP, CATS, cat, cob`, you'll end up with a NFA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to