Re: [PR] Core: Fix UnicodeUtil#truncateStringMax returns malformed string. [iceberg]

via GitHub Thu, 19 Sep 2024 14:16:17 -0700


RussellSpitzer commented on code in PR #11161:
URL: https://github.com/apache/iceberg/pull/11161#discussion_r1767604741



##########
api/src/main/java/org/apache/iceberg/util/UnicodeUtil.java:
##########
@@ -93,4 +93,24 @@ public static Literal<CharSequence> 
truncateStringMax(Literal<CharSequence> inpu
     }
     return null; // Cannot find a valid upper bound
   }
+
+  private static int incrementCodePoint(int codePoint) {
+    // surrogate code points are not Unicode scalar values,
+    // any UTF-8 byte sequence that would otherwise map to code points 
U+D800..U+DFFF is ill-formed.
+    // see 
https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-3/#G27288
+    Preconditions.checkArgument(

Review Comment:
   Just a minor point here, but shouldn't this only be relevant if we somehow 
get non-unicode binary in a unicode string? Shouldn't be possible in a Java 
string right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Fix UnicodeUtil#truncateStringMax returns malformed string. [iceberg]

Reply via email to