eric-maynard commented on PR #1966: URL: https://github.com/apache/polaris/pull/1966#issuecomment-3045822019
Good point @MonkeyCanCode re: the name. A previous version had a truly random prefix, but now it's deterministic based on a hash of the table identifier. I don't actually have a preference between the two or think it matters. Do you have a suggestion for a better name, though? Iceberg calls their similar feature `write.object-storage.enabled`, which I think is quite misleading. Re (2), you're right to consider the risk of hash collisions esp. when we only use 20 bits. In fact, collisions are inevitable as you approach 1M identifiers. However MurmurHash exhibits strong avalanche properties, meaning even small differences in the identifier (like the A vs B) produce vastly different outputs. So while collisions are possible, the chance that structurally similar identifiers (e.g., namespaceA.tableX vs. namespaceB.tableX) collide in the 20-bit prefix is still low. As for uniqueness, since the full path still includes the actual identifier, there's no ambiguity or risk of overwriting. The sibling check looks at the full path. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
