ankitsultana opened a new issue, #16619: URL: https://github.com/apache/pinot/issues/16619
Creating an issue to gauge the community interest. UUIDs are super-common in Pinot, and I am sure this is not limited to us at Uber. At present users can only use String columns to deal with UUIDs. This means that: * **Storage:** They are stored as 36 bytes in uncompressed file formats. * **Scans:** They are scanned as 36 bytes, and during conversion to String another 36 bytes are allocated for the String's internal byte buffer. While the allocation buffer is usually re-used, the String internal buffer is not. Moreover, it costs additional CPU cycles to copy over the larger byte count. * **In-Memory Representation:** After scanning, UUIDs are passed around as 36 byte values. This adds a memory/cpu overhead to a lot of operations like data shuffles, open-addressed hash table comparisons, etc. UUIDs in the end are 2 long values and can be represented using only 16 bytes. There are some usability benefits too, but regardless wanted to share this as something we are exploring. Last year we had released the UUID Hash Function for Upsert Primary Keys and that has been quite useful at Uber in increasing the per-server primary key capacity: #12538 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
