ggershinsky commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1546070409
########## core/src/main/java/org/apache/iceberg/SnapshotParser.java: ########## @@ -93,6 +95,16 @@ static void toJson(Snapshot snapshot, JsonGenerator generator) throws IOExceptio generator.writeNumberField(SCHEMA_ID, snapshot.schemaId()); } + if (snapshot.manifestListKeyMetadata() != null) { + generator.writeStringField(MANIFEST_LIST_KEY_METADATA, snapshot.manifestListKeyMetadata()); + } + + // TODO discuss: do we need to sign the size value? Or sign the whole snapshot? Review Comment: This is the question. Some thoughts on the scenarios and protection options: - currently, we don't have a metadata.json key. We have only a key for snapshot's manifest list file. Besides using it for encrypting the manifest list file, we can also use this key for signing snapshot's sensitive parts like the manifest list size field. Or for signing the whole metadata.json file (should be possible with some effort) - then we also protect the integrity of e.g. the table properties (like the table key id). - snapshot (metadata.json file) doesn't keep secret values, so encrypting it might not be required. The signatures, mentioned above, would be kept in added snapshot fields - sufficient for detecting the file modification attacks. - these protection techniques are not required with the REST catalog - because we trust the catalog service (we don't trust the storage service). Since the whole snapshot is stored in the REST catalog, we don't need to sign anything. - the manifest list key is not stored in the catalog. Instead, it is wrapped in a KMS with the table master key, and stored in the snapshot MANIFEST_LIST_KEY_METADATA field. Only the KMS-authorized (for the table key) users/processes will be able to get the manifest list key. - In catalogs other than the REST, the signatures provide a partial protection - because the metadata.json is kept in the untrusted storage. With the signatures, it can't be modified. But the whole folder can be replaced (e.g. a replay attack - where all table files are removed, and replaced with files of an older version of the table). To prevent this attack in non-REST catalogs, we will have to update the catalog per each table snapshot (setting eg the latest table version/sequence number, or a random AAD prefix) ########## api/src/main/java/org/apache/iceberg/Snapshot.java: ########## @@ -162,6 +162,25 @@ default Iterable<DeleteFile> removedDeleteFiles(FileIO io) { */ String manifestListLocation(); + /** + * Return the size of this snapshot's manifest list file. Must be a verified value, taken from a Review Comment: We can define this field to be required only for encrypted tables. It will be not set in the snapshot file for unencrypted tables - where this method can return 0 (or -1, I'll make it consistent across all implementation classes). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org