iverase commented on code in PR #13948: URL: https://github.com/apache/lucene/pull/13948#discussion_r1877894071
########## lucene/core/src/java/org/apache/lucene/util/UnicodeUtil.java: ########## @@ -627,35 +629,58 @@ public static String toHexString(String s) { } /** - * Interprets the given byte array as UTF-8 and converts to UTF-16. It is the responsibility of - * the caller to make sure that the destination array is large enough. + * Interprets the given {@link RandomAccessInput} slice as UTF-8 and converts to UTF-16. It is the + * responsibility of the caller to make sure that the destination array is large enough. + * + * <p>NOTE: Full characters are read, even if this reads past the length passed (and can result in + * an IOException if invalid UTF-8 is passed). Explicit checks for valid UTF-8 are not performed. + */ + // TODO: broken if chars.offset != 0 + public static int UTF8toUTF16(RandomAccessInput input, long offset, int length, char[] out) Review Comment: I revert this changes. If we see it is performance sensitive we can re add it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org