rdblue commented on code in PR #9752:
URL: https://github.com/apache/iceberg/pull/9752#discussion_r1501943394


##########
core/src/main/java/org/apache/iceberg/encryption/AesGcmInputFile.java:
##########
@@ -20,39 +20,33 @@
 
 import org.apache.iceberg.io.InputFile;
 import org.apache.iceberg.io.SeekableInputStream;
-import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
 
 public class AesGcmInputFile implements InputFile {
   private final InputFile sourceFile;
   private final byte[] dataKey;
   private final byte[] fileAADPrefix;
   private long plaintextLength;
 
+  /**
+   * Important: sourceFile.getLength() must return the verified plaintext 
content length, not the
+   * physical file size after encryption. This protects against tampering with 
the file size in
+   * untrusted storage systems.
+   */
   public AesGcmInputFile(InputFile sourceFile, byte[] dataKey, byte[] 
fileAADPrefix) {
     this.sourceFile = sourceFile;
     this.dataKey = dataKey;
     this.fileAADPrefix = fileAADPrefix;
-    this.plaintextLength = -1;
+    this.plaintextLength = sourceFile.getLength();

Review Comment:
   I think the problem isn't just this path. It is with other valid uses of the 
API.
   
   For instance, if you write data into an encrypted Avro file and go to read 
it, it is valid to open the Avro file without calling `newInputFile(DataFile)`. 
You could use `newInputFile(String)` and that `InputFile` would return the file 
length from the object store, which is the encrypted length. If the decryption 
stream assumes here that the incoming `InputFile` reports the plaintext length, 
there's no way to know or guarantee it.
   
   I think this needs to assume that the incoming file uses the encrypted 
length.
   
   We still need to choose whether to store the plaintext length or the 
encrypted length in table metadata. From your comment here, it looks like it 
would be easier to store the plaintext length because that is what gets 
returned by the writer. I think that makes sense, but we will need to be 
specific in `EncryptingFileIO` what is passed through and where it needs to be 
converted.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to