msfroh commented on code in PR #14350: URL: https://github.com/apache/lucene/pull/14350#discussion_r2002184400
########## lucene/core/src/java/org/apache/lucene/util/automaton/CaseFolding.java: ########## @@ -743,4 +743,42 @@ static int[] lookupAlternates(int codepoint) { return alts; } + + /** + * Folds the case of the given character according to {@link Character#toLowerCase(int)}, but with + * exceptions if the turkic flag is set. + * + * @param codepoint to code point for the character to fold + * @param turkic if true, then apply tr/az folding rules + * @return the folded character + */ + static int foldCase(int codepoint, boolean turkic) { + if (turkic) { + if (codepoint == 0x00130) { // İ [LATIN CAPITAL LETTER I WITH DOT ABOVE] + return 0x00069; // i [LATIN SMALL LETTER I] + } else if (codepoint == 0x000049) { // I [LATIN CAPITAL LETTER I] + return 0x00131; // ı [LATIN SMALL LETTER DOTLESS I] + } + } + return Character.toLowerCase(codepoint); Review Comment: Got it. Checking https://www.unicode.org/Public/16.0.0/ucd/CaseFolding.txt, indeed I can see those entries: ``` 03A3; C; 03C3; # GREEK CAPITAL LETTER SIGMA ... 03C2; C; 03C3; # GREEK SMALL LETTER FINAL SIGMA ``` Ideally, I'd love to just use those folding rules. I could get them from `UCharacter.foldCase(int, bool)`, but that involves pulling in icu4j as a dependency, which is an extra 12MB jar. Would it be worthwhile to write a generator that pulls https://www.unicode.org/Public/16.0.0/ucd/CaseFolding.txt (updated to whatever the current Unicode spec is) and generates a `foldCase` method that's functionally equivalent to `UCharacter.foldcase(int, bool)`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org