rmuir commented on PR #14350: URL: https://github.com/apache/lucene/pull/14350#issuecomment-2726984173
+1 to start simple with Character.toLowerCase, thats the best you can get in java. The problem is java not having a Character.foldCase. A proper function would look like ICU's `UCharacter.foldCase(int, boolean)`: https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/lang/UCharacter.html#foldCase-int-boolean- The regexp folding code doesn't handle turkish correctly either. dotless and dotted I are DIFFERENT, but it mixes all these characters up and conflates them. So I'd like for us not to perpetuate this further, somehow creating "nonstandard case folding" disagrees with the unicode standard. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org