dweiss commented on code in PR #14381: URL: https://github.com/apache/lucene/pull/14381#discussion_r2006930601
########## lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java: ########## @@ -778,6 +786,53 @@ private int[] toCaseInsensitiveChar(int codepoint) { } } + /** + * Expands range to include case-insensitive matches. + * + * <p>This is expensive: case-insensitive range involves iterating over the range space, adding + * alternatives. Jump on the grenade here, contain CPU and memory explosion just to this method + * activated by optional flag. + */ + private void expandCaseInsensitiveRange( + int start, int end, List<Integer> rangeStarts, List<Integer> rangeEnds) { + if (start > end) + throw new IllegalArgumentException( + "invalid range: from (" + start + ") cannot be > to (" + end + ")"); + + // contain the explosion of transitions by using a throwaway state + Automaton scratch = new Automaton(); + int state = scratch.createState(); + + // iterate over range, adding codepoint and any alternatives as transitions + for (int i = start; i <= end; i++) { + scratch.addTransition(state, state, i); + int[] altCodePoints = CaseFolding.lookupAlternates(i); + if (altCodePoints != null) { + for (int alt : altCodePoints) { + scratch.addTransition(state, state, alt); + } + } else { + int altCase = + Character.isLowerCase(i) ? Character.toUpperCase(i) : Character.toLowerCase(i); + if (altCase != i) { + scratch.addTransition(state, state, altCase); + } + } + } Review Comment: I think this pattern is used in more than one place. Maybe it'd be nicer to make CaseFolding.forAllAlternatives(altCase -> scratch.addTransition(state, state, altCase)) and have all that logic for checking for alternatives embedded in CaseFolding? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org