spanglerco opened a new issue, #48: URL: https://github.com/apache/arrow-dotnet/issues/48
### Describe the bug, including details regarding any error messages, version, and platform. `ArrowStreamReader` can read past the end of an allocated buffer within an unsafe context when reading the schema from a bad Arrow IPC stream. Specifically, the unsafe version of [ByteBuffer.GetStringUTF8](https://github.com/apache/arrow-dotnet/blob/710f6e47bd20c5a0dca471a27d67c2fff65d581c/src/Apache.Arrow/Flatbuf/FlatBuffers/ByteBuffer.cs#L694) doesn't validate that its parameters are within the buffer's bounds. When reading the name of a field from the schema message, the length of the field name is read from the buffer in [Table.__string](https://github.com/apache/arrow-dotnet/blob/710f6e47bd20c5a0dca471a27d67c2fff65d581c/src/Apache.Arrow/Flatbuf/FlatBuffers/Table.cs#L66). If the stream contains a length that exceeds the size of the buffer, the reader will read past the end. In my own testing, I've been able to crash the process with an `AccessViolationException` if the out of bounds read is large enough, or have it populate the field name with data from surrounding memory if the read isn't too large. The following reproduction case creates a valid IPC stream that only has a schema with a single column. It then edits bytes 108-111, which is the length of the column's name (originally 5) to something larger. Setting it to 50 prints a bunch of junk to the console (from the memory after the buffer), while setting it to 500 causes an access violation. ```csharp using System; using System.IO; using Apache.Arrow; using Apache.Arrow.Ipc; using Apache.Arrow.Types; const int FieldNameLengthOffset = 108; const int FakeFieldNameLength = 50; // Changing this to something larger like 500 should crash using MemoryStream stream = new(); Schema schema = new( [new Field("index", Int32Type.Default, nullable: false)], metadata: []); using (ArrowStreamWriter writer = new(stream, schema, leaveOpen: true)) { writer.WriteStart(); } stream.Position = FieldNameLengthOffset; using (StreamWriter writer = new(stream, leaveOpen: true)) { writer.Write(FakeFieldNameLength); } stream.Position = 0; using ArrowStreamReader reader = new(stream); var name = reader.Schema[0].Name; Console.WriteLine($"Field name: {name}"); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
