rdblue commented on code in PR #9752: URL: https://github.com/apache/iceberg/pull/9752#discussion_r1501943394
########## core/src/main/java/org/apache/iceberg/encryption/AesGcmInputFile.java: ########## @@ -20,39 +20,33 @@ import org.apache.iceberg.io.InputFile; import org.apache.iceberg.io.SeekableInputStream; -import org.apache.iceberg.relocated.com.google.common.base.Preconditions; public class AesGcmInputFile implements InputFile { private final InputFile sourceFile; private final byte[] dataKey; private final byte[] fileAADPrefix; private long plaintextLength; + /** + * Important: sourceFile.getLength() must return the verified plaintext content length, not the + * physical file size after encryption. This protects against tampering with the file size in + * untrusted storage systems. + */ public AesGcmInputFile(InputFile sourceFile, byte[] dataKey, byte[] fileAADPrefix) { this.sourceFile = sourceFile; this.dataKey = dataKey; this.fileAADPrefix = fileAADPrefix; - this.plaintextLength = -1; + this.plaintextLength = sourceFile.getLength(); Review Comment: I think the problem isn't just this path. It is with other valid uses of the API. For instance, if you write data into an encrypted Avro file and go to read it, it is valid to open the Avro file without calling `newInputFile(DataFile)`. You could use `newInputFile(String)` and that `InputFile` would return the file length from the object store, which is the encrypted length. If the decryption stream assumes here that the incoming `InputFile` reports the plaintext length, there's no way to know or guarantee it. I think this needs to assume that the incoming file uses the encrypted length. We still need to choose whether to store the plaintext length or the encrypted length in table metadata. From your comment here, it looks like it would be easier to store the plaintext length because that is what gets returned by the writer. I think that makes sense, but we will need to be specific in `EncryptingFileIO` what is passed through and where it needs to be converted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org