ColeAtCharter opened a new issue, #16562: URL: https://github.com/apache/pinot/issues/16562
Since pinot ingestion supports using Murmur2 for computing output partitions, the SQL interface should support the MURMURHASH2 and MURMURHASH2UTF8 functions to aid in operations (eg, `SELECT DISTINCT MOD(MURMURHASH2UTF8(my_partition_column), 25) as record_hash_value from my_table where $segmentName = 'my_segment'`) In general, any internal functionality that affects operations should be exposed to the operator. So we should ensure we also have MurmurHash 3, etc. exposed via the SQL interface. Proposed SQL functions: - MURMURHASH2 - BINARY/VARBINARY input, 32-bit output - MURMURHASH2UTF8 - CHAR/VARCHAR input, 32-bit output - MURMURHASH2BIT64 - BINARY/VARBINARY input, 64-bit output - MURMURHASH3BIT32 - BINARY/VARBINARY input, INT seed input, 32-bit signed output - MURMURHASH3X64BIT32 - BINARY/VARBINARY input, INT seed input, 32-bit signed output - MURMURHASH3BIT64 - BINARY/VARBINARY input, INT seed input, 64-bit signed output - MURMURHASH3BIT128 - BINARY/VARBINARY input, INT seed input, 128-bit signed DOUBLE (?) output - MURMURHASH3X64BIT128 - BINARY/VARBINARY input, seed input, 128-bit signed DOUBLE (?) output - JAVA_HASH_CODE - BINARY/VARBINARY input, signed INT output (32 bit -- needs to represent the range Integer.MIN_VALUE to Integer.MAX_VALUE) - JAVA_HASH_CODE_BYTE_ARRAY - maybe it can alias the JAVA_HASH_CODE logic. Depends on implementation differences in "HashCode" and "ByteArray" logic already exposed to system operators Nice to have: - *UTF8 variant of all functions above if not already named. They have CHAR/VARCHAR input instead of BINARY/VARBINARY Modulo function used by ingest partitioning should already be available through the SQL interface. Reference - https://docs.pinot.apache.org/functions/hash-functions?ask=sql+functions#murmurhash2 - https://docs.pinot.apache.org/configuration-reference/table#table-index-config -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
