Re: [PR] Allow reading binary doc values as a RandomAccessInput [lucene]

via GitHub Tue, 10 Dec 2024 03:08:58 -0800


iverase commented on code in PR #13948:
URL: https://github.com/apache/lucene/pull/13948#discussion_r1877894071



##########
lucene/core/src/java/org/apache/lucene/util/UnicodeUtil.java:
##########
@@ -627,35 +629,58 @@ public static String toHexString(String s) {
   }
 
   /**
-   * Interprets the given byte array as UTF-8 and converts to UTF-16. It is 
the responsibility of
-   * the caller to make sure that the destination array is large enough.
+   * Interprets the given {@link RandomAccessInput} slice as UTF-8 and 
converts to UTF-16. It is the
+   * responsibility of the caller to make sure that the destination array is 
large enough.
+   *
+   * <p>NOTE: Full characters are read, even if this reads past the length 
passed (and can result in
+   * an IOException if invalid UTF-8 is passed). Explicit checks for valid 
UTF-8 are not performed.
+   */
+  // TODO: broken if chars.offset != 0
+  public static int UTF8toUTF16(RandomAccessInput input, long offset, int 
length, char[] out)

Review Comment:
   I revert this changes. If we see it is performance sensitive we can re add 
it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Allow reading binary doc values as a RandomAccessInput [lucene]

Reply via email to