rmuir opened a new pull request, #14193: URL: https://github.com/apache/lucene/pull/14193
Previously caseless matching was implemented via code such as this: ```java Operations.union(Automata.makeChar('x'), Automata.makeChar('X')) ``` Proposed unicode caseless matching (#14192) implements it with repeated unions: ```java a1 = Operations.union(Automata.makeChar('x'), Automata.makeChar('X')) a2 = Operations.union(a1, Automata.makeChar('y')) a3 = Operations.union(a2, Automata.makeChar('Y')) ``` The union operation doesn't return a minimal automaton: improving union would always be nice, but this change offers a simple api for the task that returns half the number of states. Before: caseless match of "a":  After:  Before: caseless match of "lucene":  After:  Just like the `union`, the `concatenate` adds some useless states, but they are less of a problem than the ones from before. I didn't try anything more such as repeated union or kleene star, to see if I could make a really bad case, I felt like this was good enough, to get it to a better place. We can look at optimizing union/concatenate separately still, but that's always more dangerous and tricky. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org