ColeAtCharter opened a new issue, #16562:
URL: https://github.com/apache/pinot/issues/16562

   Since pinot ingestion supports using Murmur2 for computing output 
partitions, the SQL interface should support the MURMURHASH2 and 
MURMURHASH2UTF8 functions to aid in operations (eg, `SELECT DISTINCT 
MOD(MURMURHASH2UTF8(my_partition_column), 25) as record_hash_value from 
my_table where $segmentName = 'my_segment'`)
   
   In general, any internal functionality that affects operations should be 
exposed to the operator.  So we should ensure we also have MurmurHash 3, etc. 
exposed via the SQL interface.
   
   Proposed SQL functions:
   - MURMURHASH2 - BINARY/VARBINARY input, 32-bit output
   - MURMURHASH2UTF8 - CHAR/VARCHAR input, 32-bit output
   - MURMURHASH2BIT64 - BINARY/VARBINARY input, 64-bit output
   - MURMURHASH3BIT32 - BINARY/VARBINARY input, INT seed input, 32-bit signed 
output
   - MURMURHASH3X64BIT32 - BINARY/VARBINARY input, INT seed input, 32-bit 
signed output
   - MURMURHASH3BIT64 - BINARY/VARBINARY input, INT seed input, 64-bit signed 
output
   - MURMURHASH3BIT128 - BINARY/VARBINARY input, INT seed input, 128-bit signed 
DOUBLE (?) output
   - MURMURHASH3X64BIT128 - BINARY/VARBINARY input, seed  input, 128-bit signed 
DOUBLE (?) output
   - JAVA_HASH_CODE - BINARY/VARBINARY input, signed INT output (32 bit -- 
needs to represent the range Integer.MIN_VALUE to Integer.MAX_VALUE)
   - JAVA_HASH_CODE_BYTE_ARRAY - maybe it can alias the JAVA_HASH_CODE logic.  
Depends on implementation differences in "HashCode" and "ByteArray" logic 
already exposed to system operators
   
   Nice to have:
   - *UTF8 variant of all functions above if not already named.  They have 
CHAR/VARCHAR input instead of BINARY/VARBINARY
   
   Modulo function used by ingest partitioning should already be available 
through the SQL interface.
   
   Reference
   - 
https://docs.pinot.apache.org/functions/hash-functions?ask=sql+functions#murmurhash2
   - 
https://docs.pinot.apache.org/configuration-reference/table#table-index-config
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to