MonkeyCanCode commented on PR #1966: URL: https://github.com/apache/polaris/pull/1966#issuecomment-3034493090
Nice work on this piece. The approach looks good. Here are my two cents: 1. I think the word "random" may be a bit misleading, as the underlying hash function `murmur3_32_fixed` is deterministic. 2. Could `murmur3_32_fixed` introduce hash collisions, particularly within the 20 bits used for the prefix? For example, if we have two logically distinct tables, say `namespaceA.tableX` and `namespaceB.tableX`, and their full table identifiers (`"namespaceA.tableX"` and `"namespaceB.tableX"`) happen to generate the same 20-bit hash prefix, this would lead to their data being co-located under the same physical S3 directory for that prefix. While the full S3 path would still be unique due to the appended namespace/table name, is this a concern for the optimized sibling check or other aspects of data management? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
