alamb commented on code in PR #9583:
URL: https://github.com/apache/arrow-rs/pull/9583#discussion_r3018438964
##########
parquet/src/compression.rs:
##########
@@ -231,10 +231,23 @@ mod snappy_codec {
None => decompress_len(input_buf)?,
};
let offset = output_buf.len();
- output_buf.resize(offset + len, 0);
- self.decoder
- .decompress(input_buf, &mut output_buf[offset..])
- .map_err(|e| e.into())
+ output_buf.reserve(len);
+ // SAFETY: we pass the spare capacity to snappy which will write
exactly
+ // `len` bytes on success. The `set_len` below is only reached when
+ // decompression succeeds. `MaybeUninit<u8>` has the same layout
as `u8`.
+ let spare = output_buf.spare_capacity_mut();
+ let spare_bytes = unsafe {
+
std::slice::from_raw_parts_mut(spare.as_mut_ptr().cast::<u8>(), spare.len())
+ };
+ let n = self
+ .decoder
+ .decompress(input_buf, &mut spare_bytes[..len])
+ .map_err(|e| -> ParquetError { e.into() })?;
Review Comment:
I spent some more time exploring this (and arguing with Codex about it)
The main issue is the Rust snappy implementation's contract takes an output
buffer and doesn't say it can handle uninitialized bytes
That being said I can't think of how passing uninitialized bytes as an
output location could cause problems (even if snappy changes how it internally
works)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]