gsmiller commented on issue #12451: URL: https://github.com/apache/lucene/issues/12451#issuecomment-1644713373
I've tracked it down to this exact sequence being added: U+65535 (0xef 0xbf 0xbf) followed by U+65536 (0xf0 0x90 0x80 0x80). Note that these code points are immediately next to each other. If you change either of them by one value down/up, the bug no longer exists. I have to step away now but will try to spend more time on it soon. You can reproduce this by adding the following code immediately after line 127 in `TestStringsToAutomaton`: ``` sorted = new ArrayList<BytesRef>(); sorted.add(new BytesRef(new byte[]{(byte) 0xef, (byte) 0xbf, (byte) 0xbf})); sorted.add(new BytesRef(new byte[]{(byte) 0xf0, (byte) 0x90, (byte) 0x80, (byte) 0x80})); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org