gsmiller commented on issue #12451:
URL: https://github.com/apache/lucene/issues/12451#issuecomment-1644713373

   I've tracked it down to this exact sequence being added:
   U+65535 (0xef 0xbf 0xbf) followed by U+65536 (0xf0 0x90 0x80 0x80). Note 
that these code points are immediately next to each other. If you change either 
of them by one value down/up, the bug no longer exists. I have to step away now 
but will try to spend more time on it soon. 
   
   You can reproduce this by adding the following code immediately after line 
127 in `TestStringsToAutomaton`:
   ```
         sorted = new ArrayList<BytesRef>();
         sorted.add(new BytesRef(new byte[]{(byte) 0xef, (byte) 0xbf, (byte) 
0xbf}));
         sorted.add(new BytesRef(new byte[]{(byte) 0xf0, (byte) 0x90, (byte) 
0x80, (byte) 0x80}));
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to