MonkeyCanCode commented on PR #1966:
URL: https://github.com/apache/polaris/pull/1966#issuecomment-3045833941

   > Good point @MonkeyCanCode re: the name. A previous version had a truly 
random prefix, but now it's deterministic based on a hash of the table 
identifier. I don't actually have a preference between the two or think it 
matters. Do you have a suggestion for a better name, though? Iceberg calls 
their similar feature `write.object-storage.enabled`, which I think is quite 
misleading.
   > 
   > Re (2), you're right to consider the risk of hash collisions esp. when we 
only use 20 bits. In fact, collisions are inevitable as you approach 1M 
identifiers. However MurmurHash exhibits strong avalanche properties, meaning 
even small differences in the identifier (like the A vs B) produce vastly 
different outputs. So while collisions are possible, the chance that 
structurally similar identifiers (e.g., namespaceA.tableX vs. 
namespaceB.tableX) collide in the 20-bit prefix is still low.
   > 
   > As for uniqueness, since the full path still includes the actual 
identifier, there's no ambiguity or risk of overwriting. The sibling check 
looks at the full path.
   > 
   > 
   
   Thanks for the confirmation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to