maomao905 commented on issue #11976: URL: https://github.com/apache/lucene/issues/11976#issuecomment-1327957480
Thanks! > both 1 and 月 should have same offsets as they come from same input character ㋀ I run your suggested test and it failed. I am not sure this is the bug or not. The end offset of character `1` is 2 (expected offset is 3). ``` $ ./gradlew test --tests org.apache.lucene.analysis.icu.TestICUNormalizer2CharFilter.testDecomposeFromSameInputCharacter ... org.apache.lucene.analysis.icu.TestICUNormalizer2CharFilter > testDecomposeFromSameInputCharacter FAILED java.lang.AssertionError: endOffset 2 term=1 expected:<3> but was:<2> ... ``` > Test/issue should not be named "combining character" as there are no combining characters involved. "combining character" has a very specific meaning in unicode and this is not that. I changed the issue title from "combining character" to "compatibility character" `㋀` seems [enclosed CJK letters and months](https://en.wikipedia.org/wiki/Enclosed_CJK_Letters_and_Months) in unicode. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org