amogh-jahagirdar commented on code in PR #12657:
URL: https://github.com/apache/iceberg/pull/12657#discussion_r2017048228


##########
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/sql/TestSparkBucketFunction.java:
##########
@@ -74,7 +74,7 @@ public void testSpecValues() {
         .as("Spec example: hash(2017-11-16T22:31:08) = -2047944441")
         .isEqualTo(-2047944441);
 
-    assertThat(new BucketFunction.BucketString().hash("iceberg"))
+    assertThat(new 
BucketFunction.BucketString().hash("iceberg".getBytes(StandardCharsets.UTF_8)))
         .as("Spec example: hash(\"iceberg\") = 1210000089")

Review Comment:
   See comment above about keeping hash(String), instead of changing the 
existing assertion I think we should leave it as is and add a new assertion 
that asserts that 
hash("iceberg.getBytes(StandardCharsets.UTF_8)").equals(hash("iceberg"))



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/functions/BucketFunction.java:
##########
@@ -214,12 +214,11 @@ public static Integer invoke(int numBuckets, UTF8String 
value) {
         return null;
       }
 
-      // TODO - We can probably hash the bytes directly given they're already 
UTF-8 input.
-      return apply(numBuckets, hash(value.toString()));
+      return apply(numBuckets, hash(value.getBytes()));
     }
 
     // Visible for testing
-    public static int hash(String value) {
+    public static int hash(byte[] value) {
       return BucketUtil.hash(value);
     }

Review Comment:
   Even though it's a bit duplicative, I think I'd prefer to keep the existing 
`hash(String value)` function mainly for validation in the test that 
hash(String) == hash(utf8Bytes). My rationale is that since hash(String) has 
been around a while we want real confidence that the new values being produced 
are the same, and the way to get that confidence is to directly compare the new 
output against the existing output. Compatibility is a secondary benefit here 
just because we don't really care too much about compatibility in this module 
imo. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to