spanglerco opened a new issue, #48:
URL: https://github.com/apache/arrow-dotnet/issues/48

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   `ArrowStreamReader` can read past the end of an allocated buffer within an 
unsafe context when reading the schema from a bad Arrow IPC stream. 
Specifically, the unsafe version of 
[ByteBuffer.GetStringUTF8](https://github.com/apache/arrow-dotnet/blob/710f6e47bd20c5a0dca471a27d67c2fff65d581c/src/Apache.Arrow/Flatbuf/FlatBuffers/ByteBuffer.cs#L694)
 doesn't validate that its parameters are within the buffer's bounds. When 
reading the name of a field from the schema message, the length of the field 
name is read from the buffer in 
[Table.__string](https://github.com/apache/arrow-dotnet/blob/710f6e47bd20c5a0dca471a27d67c2fff65d581c/src/Apache.Arrow/Flatbuf/FlatBuffers/Table.cs#L66).
 If the stream contains a length that exceeds the size of the buffer, the 
reader will read past the end.
   
   In my own testing, I've been able to crash the process with an 
`AccessViolationException` if the out of bounds read is large enough, or have 
it populate the field name with data from surrounding memory if the read isn't 
too large.
   
   The following reproduction case creates a valid IPC stream that only has a 
schema with a single column. It then edits bytes 108-111, which is the length 
of the column's name (originally 5) to something larger. Setting it to 50 
prints a bunch of junk to the console (from the memory after the buffer), while 
setting it to 500 causes an access violation.
   
   ```csharp
   using System;
   using System.IO;
   using Apache.Arrow;
   using Apache.Arrow.Ipc;
   using Apache.Arrow.Types;
   
   const int FieldNameLengthOffset = 108;
   const int FakeFieldNameLength = 50; // Changing this to something larger 
like 500 should crash
   
   using MemoryStream stream = new();
   Schema schema = new(
       [new Field("index", Int32Type.Default, nullable: false)],
       metadata: []);
   using (ArrowStreamWriter writer = new(stream, schema, leaveOpen: true))
   {
       writer.WriteStart();
   }
   
   stream.Position = FieldNameLengthOffset;
   using (StreamWriter writer = new(stream, leaveOpen: true))
   {
       writer.Write(FakeFieldNameLength);
   }
   stream.Position = 0;
   using ArrowStreamReader reader = new(stream);
   var name = reader.Schema[0].Name;
   Console.WriteLine($"Field name: {name}");
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to