eric-maynard commented on PR #1966:
URL: https://github.com/apache/polaris/pull/1966#issuecomment-3045822019

   Good point @MonkeyCanCode re: the name. A previous version had a truly 
random prefix, but now it's deterministic based on a hash of the table 
identifier. I don't actually have a preference between the two or think it 
matters. Do you have a suggestion for a better name, though? Iceberg calls 
their similar feature `write.object-storage.enabled`, which I think is quite 
misleading.
   
   Re (2), you're right to consider the risk of hash collisions esp. when we 
only use 20 bits. In fact, collisions are inevitable as you approach 1M 
identifiers. However MurmurHash exhibits strong avalanche properties, meaning 
even small differences in the identifier (like the A vs B) produce vastly 
different outputs. So while collisions are possible, the chance that 
structurally similar identifiers (e.g., namespaceA.tableX vs. 
namespaceB.tableX) collide in the 20-bit prefix is still low.
   
   As for uniqueness, since the full path still includes the actual identifier, 
there's no ambiguity or risk of overwriting. The sibling check looks at the 
full path.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to