Hugo-WB commented on code in PR #16353:
URL: https://github.com/apache/iceberg/pull/16353#discussion_r3430033839
##########
core/src/main/java/org/apache/iceberg/encryption/EncryptionUtil.java:
##########
@@ -181,6 +185,37 @@ public static Map<String, EncryptedKey>
encryptionKeys(EncryptionManager em) {
return sem.encryptionKeys();
}
+ /**
+ * Returns {@link TableMetadata} with encrypted keys from an {@param
encryptionManager} that are
+ * required to read the snapshots in the given {@param metadata}. Only adds
keys that are still
+ * referenced in metadata.
+ */
+ public static TableMetadata addEmKeysToMetadata(
+ TableMetadata metadata, EncryptionManager encryptionManager) {
+ if (!(encryptionManager instanceof StandardEncryptionManager
standardEncryptionManager)) {
+ return metadata;
+ }
+ Set<String> referencedEncryptionKeys =
+ Sets.union(
+ metadata.snapshots().stream()
+ .map(Snapshot::keyId)
+ .filter(Objects::nonNull)
+ .collect(Collectors.toUnmodifiableSet()),
+ metadata.encryptionKeys().stream()
+ .map(EncryptedKey::encryptedById)
+ .filter(Objects::nonNull)
+ .collect(Collectors.toUnmodifiableSet()));
+ TableMetadata.Builder metadataBuilder = TableMetadata.buildFrom(metadata);
+ standardEncryptionManager.encryptionKeys().values().stream()
+ .filter(
+ encryptedKey ->
+ referencedEncryptionKeys.contains(encryptedKey.keyId())
+ // The KEK may not be referenced but must still be saved
in the metadata.
+ ||
standardEncryptionManager.keyEncryptionKeyID().equals(encryptedKey.keyId()))
+ .forEach(metadataBuilder::addEncryptionKey);
+ return metadataBuilder.build();
Review Comment:
This new logic depends on an invariance that the `manifestListKey -> KEK ->
table encryption key`. are all one "hop". E.g. the `->` means encryptedBy.
If there was a world where a key in the encryptionKeys() list is encrypted
by another key in the encryptionKey list, which is then in itself encrypted by
the KEK, and they were all added in a single commit, this PR would fail to
propagate the middle key.
E.g.:
```
EncryptionKeys:[{key: A, encryptedBy: B}, {key: B, encryptedBy: C}, {key: C,
encryptedBy: KEK}]
snapshot: [snapshotId: x, keyId: A]
```
And all of the above happens within one commit.
In the above code, we would not add key B to the metadata file. As none of
the encryptionKeys will be part of `metadata.encryptionKeys()` as that will be
empty. We will keep A and C, since A is referenced by snapshot and C is a KEK.
TLDR: question @ggershinsky : we expect the depth of encryption key
referencing from manifest list -> KEK to remain at a single hop right, we don't
expect there to be another hop?
##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -245,18 +244,7 @@ protected void doCommit(TableMetadata base, TableMetadata
metadata) {
String newMetadataLocation;
EncryptionManager encrManager = encryption();
- if (encrManager instanceof StandardEncryptionManager) {
- // Add new encryption keys to the metadata
- TableMetadata.Builder builder = TableMetadata.buildFrom(metadata);
- for (Map.Entry<String, EncryptedKey> entry :
- EncryptionUtil.encryptionKeys(encrManager).entrySet()) {
- builder.addEncryptionKey(entry.getValue());
- }
-
- tableMetadata = builder.build();
- } else {
- tableMetadata = metadata;
- }
+ tableMetadata = EncryptionUtil.addEmKeysToMetadata(metadata, encrManager);
Review Comment:
cc @smaheshwar-pltr for: https://github.com/apache/iceberg/pull/13225. If
you could update your REST implementation to use this utility once it merges.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]