rmuir commented on issue #11976: URL: https://github.com/apache/lucene/issues/11976#issuecomment-1327601907
The test is wrong: the startoffsets are correct. input stream is 4 characters long. I would expect: `0, 1, 2, 2, 3` for startoffsets and `1, 2, 3, 3, 4` for endoffsets. both `1` and `月` should have same offsets as they come from same input character `㋀` nothing needs to go backwards. #9820 is not related and just a catch-all for misunderstandings about offsets. Test/issue should not be named "combining character" as there are no combining characters involved. "combining character" has a very specific meaning in unicode and this is not that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org