msfroh commented on code in PR #14350: URL: https://github.com/apache/lucene/pull/14350#discussion_r1992783083
########## lucene/core/src/java/org/apache/lucene/util/automaton/StringsToAutomaton.java: ########## @@ -209,7 +209,25 @@ private static int convert( int i = 0; int[] labels = s.labels; for (StringsToAutomaton.State target : s.states) { - a.addTransition(converted, convert(a, target, visited), labels[i++]); + int label = labels[i++]; + int dest = convert(a, target, visited, caseInsensitive); + a.addTransition(converted, dest, label); + if (caseInsensitive) { + int[] alternatives = CaseFolding.lookupAlternates(label); + if (alternatives != null) { + for (int alt : alternatives) { + a.addTransition(converted, dest, alt); + } + } else { + int altCase = + Character.isLowerCase(label) + ? Character.toUpperCase(label) + : Character.toLowerCase(label); + if (altCase != label) { + a.addTransition(converted, dest, altCase); Review Comment: Essentially, I tried to copy what you did for the case-insensitive regex matching to add extra transition arcs for the other letter-cases. I think the `finish` call is handled at the end. Note that this implementation will be way more efficient if all of the input strings are the same case. Otherwise, it might miss common (case-insensitive) prefixes. I'm imagining that a query would lowercase everything first. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org