maomao905 commented on issue #11976:
URL: https://github.com/apache/lucene/issues/11976#issuecomment-1327957480
Thanks!
> both 1 and 月 should have same offsets as they come from same input
character ㋀
I run your suggested test and it failed. I am not sure this is the bug or
not.
The end offset of character `1` is 2 (expected offset is 3).
```
$ ./gradlew test --tests
org.apache.lucene.analysis.icu.TestICUNormalizer2CharFilter.testDecomposeFromSameInputCharacter
...
org.apache.lucene.analysis.icu.TestICUNormalizer2CharFilter >
testDecomposeFromSameInputCharacter FAILED
java.lang.AssertionError: endOffset 2 term=1 expected:<3> but was:<2>
...
```
> Test/issue should not be named "combining character" as there are no
combining characters involved. "combining character" has a very specific
meaning in unicode and this is not that.
I changed the issue title from "combining character" to "compatibility
character"
`㋀` seems [enclosed CJK letters and
months](https://en.wikipedia.org/wiki/Enclosed_CJK_Letters_and_Months) in
unicode.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]