ankitsultana opened a new issue, #16619:
URL: https://github.com/apache/pinot/issues/16619

   Creating an issue to gauge the community interest.
   
   UUIDs are super-common in Pinot, and I am sure this is not limited to us at 
Uber. At present users can only use String columns to deal with UUIDs. This 
means that:
   
   * **Storage:** They are stored as 36 bytes in uncompressed file formats.
   * **Scans:** They are scanned as 36 bytes, and during conversion to String 
another 36 bytes are allocated for the String's internal byte buffer. While the 
allocation buffer is usually re-used, the String internal buffer is not. 
Moreover, it costs additional CPU cycles to copy over the larger byte count.
   * **In-Memory Representation:** After scanning, UUIDs are passed around as 
36 byte values. This adds a memory/cpu overhead to a lot of operations like 
data shuffles, open-addressed hash table comparisons, etc.
   
   UUIDs in the end are 2 long values and can be represented using only 16 
bytes. There are some usability benefits too, but regardless wanted to share 
this as something we are exploring.
   
   Last year we had released the UUID Hash Function for Upsert Primary Keys and 
that has been quite useful at Uber in increasing the per-server primary key 
capacity: #12538


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to