satishd commented on PR #12538:
URL: https://github.com/apache/pinot/pull/12538#issuecomment-2032592795

   >@satishd I guess you mean collision between two primary keys where one is 
UUID conformant and the other is not?
   
   Right.
   
   >Unless we prepend the length of the string in the UUID conformant 
serialization scheme, there will always be theoretical chances of collision. I 
think prepending the length of the string removes chances of such collision 
beyond any reasonable doubt. I got rid of the null terminated strings because 
those felt too easy to collide (e.g. (foo, bar) and (fo, obar)).
   
   @ankitsultana Other approach like falling back to Murmur3 may create 
collisions but reduce size. I understand that you are trying to make a tradeoff 
here for consistency and storage by trying to generate the unique keys even 
when the values are not conforming to the respective UUID format. But it can 
cause collisions in very rare scenarios. This is because it uses 
`value.toString().getBytes(StandardCharsets.UTF_8)`. The null values and 
strings as "null" result the same based on the concat method implementation.
   
   These are rare scenarios and the proposed implementation seems reasonable to 
me. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to