Re: [PR] [DRAFT] Case-insensitive matching over union of strings [lucene]

via GitHub Wed, 12 Mar 2025 21:47:51 -0700


msfroh commented on code in PR #14350:
URL: https://github.com/apache/lucene/pull/14350#discussion_r1992783083



##########
lucene/core/src/java/org/apache/lucene/util/automaton/StringsToAutomaton.java:
##########
@@ -209,7 +209,25 @@ private static int convert(
     int i = 0;
     int[] labels = s.labels;
     for (StringsToAutomaton.State target : s.states) {
-      a.addTransition(converted, convert(a, target, visited), labels[i++]);
+      int label = labels[i++];
+      int dest = convert(a, target, visited, caseInsensitive);
+      a.addTransition(converted, dest, label);
+      if (caseInsensitive) {
+        int[] alternatives = CaseFolding.lookupAlternates(label);
+        if (alternatives != null) {
+          for (int alt : alternatives) {
+            a.addTransition(converted, dest, alt);
+          }
+        } else {
+          int altCase =
+                  Character.isLowerCase(label)
+                          ? Character.toUpperCase(label)
+                          : Character.toLowerCase(label);
+          if (altCase != label) {
+            a.addTransition(converted, dest, altCase);

Review Comment:
   Essentially, I tried to copy what you did for the case-insensitive regex 
matching to add extra transition arcs for the other letter-cases.
   
   I think the `finish` call is handled at the end.
   
   Note that this implementation will be way more efficient if all of the input 
strings are the same case. Otherwise, it might miss common (case-insensitive) 
prefixes. I'm imagining that a query would lowercase everything first.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] [DRAFT] Case-insensitive matching over union of strings [lucene]

Reply via email to