ankitsultana commented on code in PR #12538:
URL: https://github.com/apache/pinot/pull/12538#discussion_r1509906166


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/HashUtils.java:
##########
@@ -36,6 +40,35 @@ public static byte[] hashMD5(byte[] bytes) {
     return Hashing.md5().hashBytes(bytes).asBytes();
   }
 
+  /**
+   * For use-cases where the primary-key is set to columns that are guaranteed 
to be type-4 UUIDs, this hash-function
+   * will reduce the number of bytes required from 36 to 16 for each UUID, 
without losing any precision. This leverages
+   * the fact that a type-4 UUID is essentially a 16-byte value.
+   */
+  public static byte[] hashUUIDv4(byte[] bytes) {
+    if (bytes.length % 36 != 0) {
+      return bytes;
+    }
+    byte[] resultBytes = new byte[(bytes.length / 36) * 16];
+    ByteBuffer byteBuffer = 
ByteBuffer.wrap(resultBytes).order(ByteOrder.BIG_ENDIAN);
+    for (int chunk = 0; chunk < bytes.length; chunk += 36) {
+      byte[] tempBytes = new byte[36];
+      System.arraycopy(bytes, chunk, tempBytes, 0, tempBytes.length);
+      UUID uuid;
+      try {
+        uuid = UUID.fromString(new String(tempBytes, StandardCharsets.UTF_8));
+      } catch (Exception e) {
+        // In case of failures, make the hash no-op.
+        return bytes;

Review Comment:
   self-review: is it safe to return the input as is? Ideally we should create 
a copy.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to