collado-mike commented on PR #1424: URL: https://github.com/apache/polaris/pull/1424#issuecomment-2859349730
> > I know there is some churn in the encryption configuration in Iceberg, but I also know that s3.sse.key is a property that's currently used to specify the KMS key associated with a table. > > The way I read [this](https://iceberg.apache.org/docs/1.6.0/aws/#s3-server-side-encryption), I believe the `s3.sse.key` property is on the catalog rather than the table - and I don't see this key being persisted in the [table metadata either](https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/TableMetadata.java). I'm not an Iceberg expert, but I'm guessing that won't be right to do since the user can change this input in their session and we would not know if they can have access to this KMS key from the session itself? Because of the way we construct the `FileIO` in Polaris, we can actually pass in whatever properties we want: https://github.com/apache/polaris/blob/b93e97b9c5c97cb813696df50aee7550f21b155c/service/common/src/main/java/org/apache/polaris/service/catalog/io/DefaultFileIOFactory.java#L88-L109 . We also return the `config` field of the `LoadTableResponse` with the same properties, so the client would be able to read/write the files with the correct encryption key. Notably, this PR handles the permissions side of the KMS d/encryption, but doesn't specify how the table is configured to ensure files are written with encryption. Enabling setting the key in the table properties and exposing them via the `config` field of the `LoadTableResponse` would ensure that both the service and the client are using the same key to write data and metadata files. As of the current revision, the only way to ensure the files are written with the same key is by relying on the S3 SSE bucket configuration. That means that cases such as object-store layout (where a random prefix is inserted before the table path) couldn't ensure that different tables are encrypted with different keys. To be very clear, I _really, really, really_ want to see this KMS support merged. But I really want it because it means that we don't have to rely solely on path prefixes to guarantee security of the credentials we vend. Currently, we have to prevent table path overlap because otherwise a single set of credentials could be used to read data from tables the user doesn't have access to. By supporting SSE and credentials that specify table-level encryption keys, we can allow for table location overlap _and_ still guarantee that a given set of credentials can only be used for the table they were generated for. But that requires a very targeted IAM policy - even the encryption context alone isn't enough to support object-store layout without the ability to decrypt files for every single table under the same prefix. In my mind, both the encryption context and the specific KMS key are required to support that. For that, Approach 3 is the only viable approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
