aik-jahoda opened a new issue, #47545: URL: https://github.com/apache/arrow/issues/47545
### Describe the enhancement requested `ArrowStreamWriter` allocates non trivial amount of byte[][] and byte[] objects. In our scenario where we use new ArrowStreamWriter for each Recordbatch it creates significant memory overhead. | Method | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Allocated |----------------------- |----------:|----------:|-----------:|------:|--------:|-------:|-------:|----------:| | ArrowStreamWriterCctor | 843.42 ns | 37.066 ns | 108.124 ns | 1.17 | 0.24 | 0.7687 | 0.0248 | **9656 B** | The majority of allocated data is comming from [instantiating ArrayPool](https://github.com/apache/arrow/blob/main/csharp/src/Apache.Arrow/Ipc/ArrowStreamWriter.cs#L666) The `ArrayPool<byte>.Create()` is very expensive operation as it preallocate the whole pool : https://github.com/dotnet/runtime/blob/b6127f9c7f6bab00186ec43d4a332053a1d02325/src/libraries/System.Private.CoreLib/src/System/Buffers/ConfigurableArrayPool.cs#L43-L46 | Method | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Allocated | |----------------------- |----------:|----------:|-----------:|------:|--------:|-------:|-------:|----------: | ArrayPoolCreate | 739.13 ns | 40.097 ns | 117.597 ns | 1.03 | 0.23 | 0.6428 | 0.0172 | **8072 B** | There is several options how to solve this unnecessary allocation: ### Limit the pool size This is the simplest change we can do but not perfect. As the pool is used only for arrays of size 4 or 8, we don't need to cfreate full `ArrayPool`, but limit it by parameters: `ArrayPool<byte>.Create(8, 50)` Here is comparison of full and limited pool: | Method | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Allocated | Alloc Ratio | |----------------------- |----------:|----------:|-----------:|------:|--------:|-------:|-------:|----------:|------------:| | ArrayPoolCreate | 739.13 ns | 40.097 ns | 117.597 ns | 1.03 | 0.23 | 0.6428 | 0.0172 | 8072 B | 1.00 | | ArrayPoolCreateSmall | 68.02 ns | 3.041 ns | 8.823 ns | 0.09 | 0.02 | 0.0414 | - | 520 B | 0.06 | ### Use ArrayPool<byte>.Shared Actually there is no reason to create our own array pool because the `Stream.Write(ReadOnlySpan<byte> buffer)` becaue it rent from ArrayPool<byte>.Shared anyway in the default behaviour: https://github.com/dotnet/runtime/blob/b6127f9c7f6bab00186ec43d4a332053a1d02325/src/libraries/System.Private.CoreLib/src/System/IO/Stream.cs#L912-L924 ### Don't use Array pool at all As the array pool is used only in `WriteIpcMessageLengthAsync` and `WriteIpcMessageLengthAsync` it should be enought to have shared array used as a buffer. This would imply the ArrowStreamWriter is not thread safe. ### Component(s) C# -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
