Hugo-WB opened a new pull request, #16837:
URL: https://github.com/apache/iceberg/pull/16837

   Addresses: https://github.com/apache/iceberg/issues/16352
   
   After putting up:  https://github.com/apache/iceberg/pull/16353 which 
cleaned encryption keys at expire_snapshot time by removing the encryption key 
associated with the removed snapshot.
   When writing E2E tests for above, it wasn't working due to 
[HiveTableOperations::doCommit](https://github.com/apache/iceberg/blob/69063fb162e835c3dc1ff7bbcf5526029771c6d0/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java#L253)
 always adding all keys from the EncryptionManager into the TableMetadata.
   
   From my rough understanding, sorry if this is wrong 😅, `EncryptionManager` 
is independent of `TableMetadata` and they are only reconciled at commit time. 
   This leads to issues when trying to remove keys. Even if the EncryptedKey is 
removed from the base metadata, and the new `metadata` does not have the key, 
since the `EncryptionManager` is built on top of the base metadata, it will 
still have the old keys.
   
   This PR proposes to fix the above by filtering the keys added to the 
metadata file to only the keys that are still referenced/needed by the new 
metadata file.
   
   This effectively does cleanup at runtime.
   
   Ideas for alternative solutions (Open to ideas here, I think I have a rather 
limited view here of how things are correlated with each-other):
   - Rewrite EncryptionManager to be built on top of TableMetadata. Use 
TableMetadata as source of truth.
     - EncryptionManager does not store state, delegates to TableMetadata. 
     - Alterations to EncryptionManager are immediately reflected within its 
version of TableMetadata
       - E.g. adding encryption keys or removing encryption keys.
   - Rewrite EncryptionManager to be more tightly coupled with TableMetadata.
     - Such that when RemoveSnapshot -> remove encryption key from 
EncryptionManager. 
     - Keep the "adding of all encryption keys from EncryptionManager" at 
commit time.
   
   
   This also related to the REST catalog implementation of encryption: 
https://github.com/apache/iceberg/pull/13225
   The above PR has the same issue as the Hive implementation, where it will 
always add all keys from the `EncryptionManager`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to