rmuir commented on PR #14350:
URL: https://github.com/apache/lucene/pull/14350#issuecomment-2726984173

   +1 to start simple with Character.toLowerCase, thats the best you can get in 
java.
   
   The problem is java not having a Character.foldCase. A proper function would 
look like ICU's `UCharacter.foldCase(int, boolean)`: 
https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/lang/UCharacter.html#foldCase-int-boolean-
   
   The regexp folding code doesn't handle turkish correctly either. dotless and 
dotted I are DIFFERENT, but it mixes all these characters up and conflates 
them. So I'd like for us not to perpetuate this further, somehow creating 
"nonstandard case folding" disagrees with the unicode standard.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to