ankitsultana commented on code in PR #12538: URL: https://github.com/apache/pinot/pull/12538#discussion_r1509906166
########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/HashUtils.java: ########## @@ -36,6 +40,35 @@ public static byte[] hashMD5(byte[] bytes) { return Hashing.md5().hashBytes(bytes).asBytes(); } + /** + * For use-cases where the primary-key is set to columns that are guaranteed to be type-4 UUIDs, this hash-function + * will reduce the number of bytes required from 36 to 16 for each UUID, without losing any precision. This leverages + * the fact that a type-4 UUID is essentially a 16-byte value. + */ + public static byte[] hashUUIDv4(byte[] bytes) { + if (bytes.length % 36 != 0) { + return bytes; + } + byte[] resultBytes = new byte[(bytes.length / 36) * 16]; + ByteBuffer byteBuffer = ByteBuffer.wrap(resultBytes).order(ByteOrder.BIG_ENDIAN); + for (int chunk = 0; chunk < bytes.length; chunk += 36) { + byte[] tempBytes = new byte[36]; + System.arraycopy(bytes, chunk, tempBytes, 0, tempBytes.length); + UUID uuid; + try { + uuid = UUID.fromString(new String(tempBytes, StandardCharsets.UTF_8)); + } catch (Exception e) { + // In case of failures, make the hash no-op. + return bytes; Review Comment: self-review: is it safe to return the input as is? Ideally we should create a copy. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org