ggershinsky commented on code in PR #7770:
URL: https://github.com/apache/iceberg/pull/7770#discussion_r1546070409
##########
core/src/main/java/org/apache/iceberg/SnapshotParser.java:
##########
@@ -93,6 +95,16 @@ static void toJson(Snapshot snapshot, JsonGenerator
generator) throws IOExceptio
generator.writeNumberField(SCHEMA_ID, snapshot.schemaId());
}
+ if (snapshot.manifestListKeyMetadata() != null) {
+ generator.writeStringField(MANIFEST_LIST_KEY_METADATA,
snapshot.manifestListKeyMetadata());
+ }
+
+ // TODO discuss: do we need to sign the size value? Or sign the whole
snapshot?
Review Comment:
This is the question. Some thoughts on the scenarios and protection options:
- currently, we don't have a metadata.json key. We have only a key for
snapshot's manifest list file. Besides using it for encrypting the manifest
list file, we can also use this key for signing snapshot's sensitive parts like
the manifest list size field. Or for signing the whole metadata.json file
(should be possible with some effort) - then we also protect the integrity of
e.g. the table properties (like the table key id).
- snapshot (metadata.json file) doesn't keep secret values, so encrypting it
might not be required. The signatures, mentioned above, would be kept in added
snapshot fields - sufficient for detecting the file modification attacks.
- these protection techniques are not required with the REST catalog -
because we trust the catalog service (we don't trust the storage service).
Since the whole snapshot is stored in the REST catalog, we don't need to sign
anything.
- the manifest list key is not stored in the catalog. Instead, it is wrapped
in a KMS with the table master key, and stored in the snapshot
MANIFEST_LIST_KEY_METADATA field. Only the KMS-authorized (for the table key)
users/processes will be able to get the manifest list key.
- In catalogs other than the REST, the signatures provide a partial
protection - because the metadata.json is kept in the untrusted storage. With
the signatures, it can't be modified. But the whole folder can be replaced
(e.g. a replay attack - where all table files are removed, and replaced with
files of an older version of the table). To prevent this attack in non-REST
catalogs, we will have to update the catalog per each table snapshot (setting
eg the latest table version/sequence number, or a random AAD prefix)
##########
api/src/main/java/org/apache/iceberg/Snapshot.java:
##########
@@ -162,6 +162,25 @@ default Iterable<DeleteFile> removedDeleteFiles(FileIO io)
{
*/
String manifestListLocation();
+ /**
+ * Return the size of this snapshot's manifest list file. Must be a verified
value, taken from a
Review Comment:
We can define this field to be required only for encrypted tables. It will
be not set in the snapshot file for unencrypted tables - where this method can
return 0 (or -1, I'll make it consistent across all implementation classes).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]