szlta commented on code in PR #16527: URL: https://github.com/apache/iceberg/pull/16527#discussion_r3288431987
########## format/spec.md: ########## @@ -1213,6 +1066,45 @@ Notes: 1. The format of encrypted key metadata is determined by the table's encryption scheme and can be a wrapped format specific to the table's KMS provider. +#### Standard Key Metadata + +The `key_metadata` field in manifest entries stores per-file encryption key material as a binary blob. To enable cross-implementation interoperability, the standard encryption scheme defines the following binary format for this field: + +``` +VersionByte Payload +``` + +where: + +* `VersionByte` is a single byte indicating the key metadata schema version. Currently, the only valid version is `0x01`. +* `Payload` is an Avro binary-encoded record (not a container file — only the raw binary encoding of the fields) using the schema for the given version. + +The Avro schema for version 1 is a record with the following fields, in order: + +| Field name | Avro type | Required | Description | +|---|---|---|---| +| **`encryption_key`** | `bytes` | _required_ | The data encryption key (DEK) for this file. Must be 16, 24, or 32 bytes (corresponding to AES-128, AES-192, or AES-256). | +| **`aad_prefix`** | `bytes` | _optional_ | Random AAD prefix used for [AES GCM Stream](gcm-stream-spec.md) block authentication. | +| **`file_length`** | `long` | _optional_ | The plaintext file length before encryption. Used to detect truncation attacks (see [AES GCM Stream file length](gcm-stream-spec.md#file-length)). | + +The AAD prefix is combined with a 4-byte little-endian block index to form the AAD for each AES GCM Stream cipher block, as described in the [AES GCM Stream AAD section](gcm-stream-spec.md#additional-authenticated-data). + +##### Encryption Key Hierarchy + +The standard encryption scheme uses a two-tier key hierarchy tracked in the table metadata `encryption-keys` list: + +1. **Key Encryption Keys (KEKs):** Entries where `encrypted-by-id` equals the table's encryption key ID (configured via `encryption.key-id`). The `encrypted-key-metadata` contains the KEK wrapped by the KMS and is opaque to Iceberg — its format is determined by the KMS provider. Review Comment: Perhaps mention here that a `KEY_TIMESTAMP` property is expected to be present for KEKs - AFAIK without it decryption flow will error out. ########## format/spec.md: ########## @@ -1213,6 +1066,45 @@ Notes: 1. The format of encrypted key metadata is determined by the table's encryption scheme and can be a wrapped format specific to the table's KMS provider. +#### Standard Key Metadata + +The `key_metadata` field in manifest entries stores per-file encryption key material as a binary blob. To enable cross-implementation interoperability, the standard encryption scheme defines the following binary format for this field: + +``` +VersionByte Payload +``` + +where: + +* `VersionByte` is a single byte indicating the key metadata schema version. Currently, the only valid version is `0x01`. +* `Payload` is an Avro binary-encoded record (not a container file — only the raw binary encoding of the fields) using the schema for the given version. + +The Avro schema for version 1 is a record with the following fields, in order: + +| Field name | Avro type | Required | Description | +|---|---|---|---| +| **`encryption_key`** | `bytes` | _required_ | The data encryption key (DEK) for this file. Must be 16, 24, or 32 bytes (corresponding to AES-128, AES-192, or AES-256). | +| **`aad_prefix`** | `bytes` | _optional_ | Random AAD prefix used for [AES GCM Stream](gcm-stream-spec.md) block authentication. | Review Comment: ```suggestion | **`aad_prefix`** | `bytes` | _optional_ | Random AAD prefix used for [AES GCM Stream](gcm-stream-spec.md) integrity protection. | ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
