maomao905 commented on issue #11976:
URL: https://github.com/apache/lucene/issues/11976#issuecomment-1327957480

   Thanks! 
   
   > both 1 and 月 should have same offsets as they come from same input 
character ㋀
   
   I run your suggested test and it failed. I am not sure this is the bug or 
not.
   The end offset of character `1` is 2 (expected offset is 3).
   ```
   $ ./gradlew test --tests 
org.apache.lucene.analysis.icu.TestICUNormalizer2CharFilter.testDecomposeFromSameInputCharacter
   ...
   org.apache.lucene.analysis.icu.TestICUNormalizer2CharFilter > 
testDecomposeFromSameInputCharacter FAILED
       java.lang.AssertionError: endOffset 2 term=1 expected:<3> but was:<2>
   ...
   ```
   
   > Test/issue should not be named "combining character" as there are no 
combining characters involved. "combining character" has a very specific 
meaning in unicode and this is not that.
   
   I changed the issue title from "combining character" to "compatibility 
character"
   `㋀` seems [enclosed CJK letters and 
months](https://en.wikipedia.org/wiki/Enclosed_CJK_Letters_and_Months) in 
unicode.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to