alamb commented on code in PR #9583:
URL: https://github.com/apache/arrow-rs/pull/9583#discussion_r3018438964


##########
parquet/src/compression.rs:
##########
@@ -231,10 +231,23 @@ mod snappy_codec {
                 None => decompress_len(input_buf)?,
             };
             let offset = output_buf.len();
-            output_buf.resize(offset + len, 0);
-            self.decoder
-                .decompress(input_buf, &mut output_buf[offset..])
-                .map_err(|e| e.into())
+            output_buf.reserve(len);
+            // SAFETY: we pass the spare capacity to snappy which will write 
exactly
+            // `len` bytes on success. The `set_len` below is only reached when
+            // decompression succeeds. `MaybeUninit<u8>` has the same layout 
as `u8`.
+            let spare = output_buf.spare_capacity_mut();
+            let spare_bytes = unsafe {
+                
std::slice::from_raw_parts_mut(spare.as_mut_ptr().cast::<u8>(), spare.len())
+            };
+            let n = self
+                .decoder
+                .decompress(input_buf, &mut spare_bytes[..len])
+                .map_err(|e| -> ParquetError { e.into() })?;

Review Comment:
   I spent some more time exploring this (and arguing with Codex about it) 
   
   
   The main issue is the Rust snappy implementation's contract takes an output 
buffer and doesn't say it can handle uninitialized bytes
   
   That being said I can't think of how passing uninitialized bytes as an 
output location could cause problems (even if snappy changes how it internally 
works)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to