xiaoxuandev commented on code in PR #12657: URL: https://github.com/apache/iceberg/pull/12657#discussion_r2017433032
########## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/functions/BucketFunction.java: ########## @@ -214,12 +214,11 @@ public static Integer invoke(int numBuckets, UTF8String value) { return null; } - // TODO - We can probably hash the bytes directly given they're already UTF-8 input. - return apply(numBuckets, hash(value.toString())); + return apply(numBuckets, hash(value.getBytes())); } // Visible for testing - public static int hash(String value) { + public static int hash(byte[] value) { return BucketUtil.hash(value); } Review Comment: Thanks Amogh for the review! That makes sense. Keeping `hash(String value)` for validation ensures correctness and avoids regressions. I will add a test case to verify that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org