mbutrovich commented on code in PR #2584:
URL: https://github.com/apache/iceberg-rust/pull/2584#discussion_r3414713299
##########
crates/iceberg/src/arrow/reader/pipeline.rs:
##########
@@ -431,14 +437,44 @@ impl ArrowReader {
)
.with_parquet_read_options(parquet_read_options);
- let arrow_metadata = ArrowReaderMetadata::load_async(&mut reader,
Default::default())
+ let arrow_reader_options =
Self::build_arrow_reader_options(key_metadata)?;
+
+ let arrow_metadata = ArrowReaderMetadata::load_async(&mut reader,
arrow_reader_options)
.await
.map_err(|e| {
Error::new(ErrorKind::Unexpected, "Failed to load Parquet
metadata").with_source(e)
})?;
Ok((reader, arrow_metadata))
}
+
+ /// Builds `ArrowReaderOptions`, adding `FileDecryptionProperties` when
+ /// key metadata is present for Parquet Modular Encryption.
+ fn build_arrow_reader_options(key_metadata: Option<&[u8]>) ->
Result<ArrowReaderOptions> {
+ match key_metadata {
+ Some(km) => {
+ let standard_key_metadata = StandardKeyMetadata::decode(km)?;
+ let mut builder = FileDecryptionProperties::builder(
+ standard_key_metadata.encryption_key().as_bytes().to_vec(),
Review Comment:
The decoded DEK is passed straight to
`FileDecryptionProperties::builder(key)`. A malformed key currently surfaces as
arrow-rs's generic build/decrypt error. Would an explicit check that the key is
a valid AES length (16/24/32 bytes), returning a clear `iceberg::Error`, be
worth adding? That is a real invariant with a better message than the
downstream failure.
##########
crates/iceberg/src/arrow/reader/pipeline.rs:
##########
@@ -431,14 +437,44 @@ impl ArrowReader {
)
.with_parquet_read_options(parquet_read_options);
- let arrow_metadata = ArrowReaderMetadata::load_async(&mut reader,
Default::default())
+ let arrow_reader_options =
Self::build_arrow_reader_options(key_metadata)?;
+
+ let arrow_metadata = ArrowReaderMetadata::load_async(&mut reader,
arrow_reader_options)
.await
.map_err(|e| {
Error::new(ErrorKind::Unexpected, "Failed to load Parquet
metadata").with_source(e)
})?;
Ok((reader, arrow_metadata))
}
+
+ /// Builds `ArrowReaderOptions`, adding `FileDecryptionProperties` when
+ /// key metadata is present for Parquet Modular Encryption.
+ fn build_arrow_reader_options(key_metadata: Option<&[u8]>) ->
Result<ArrowReaderOptions> {
+ match key_metadata {
+ Some(km) => {
+ let standard_key_metadata = StandardKeyMetadata::decode(km)?;
Review Comment:
`StandardKeyMetadata.file_length` is parsed but unused; the size used for
reading comes from `task.file_size_in_bytes`. That matches Java, where
`fileLength()` has no consumer on the native-encryption path (it is for AGS1
stream decryption, not PME data files), so this looks correct. Not worth
asserting `file_length == file_size_in_bytes`: `file_length` is optional and
the spec does not guarantee the two are equal.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]