aik-jahoda opened a new issue, #41:
URL: https://github.com/apache/arrow-dotnet/issues/41

   ### Describe the enhancement requested
   
   `ArrowStreamWriter` allocates non trivial amount of byte[][] and byte[] 
objects. In our scenario where we use new ArrowStreamWriter for each 
Recordbatch it creates significant memory overhead. 
   
   | Method                 | Mean      | Error     | StdDev     | Ratio | 
RatioSD | Gen0   | Gen1   | Allocated 
   |----------------------- 
|----------:|----------:|-----------:|------:|--------:|-------:|-------:|----------:|
   | ArrowStreamWriterCctor | 843.42 ns | 37.066 ns | 108.124 ns |  1.17 |    
0.24 | 0.7687 | 0.0248 |    **9656 B** | 
   
   The majority of allocated data is comming from [instantiating 
ArrayPool](https://github.com/apache/arrow/blob/main/csharp/src/Apache.Arrow/Ipc/ArrowStreamWriter.cs#L666)
   
   The `ArrayPool<byte>.Create()` is very expensive operation as it preallocate 
the whole pool :
   
   
https://github.com/dotnet/runtime/blob/b6127f9c7f6bab00186ec43d4a332053a1d02325/src/libraries/System.Private.CoreLib/src/System/Buffers/ConfigurableArrayPool.cs#L43-L46
   
   | Method                 | Mean      | Error     | StdDev     | Ratio | 
RatioSD | Gen0   | Gen1   | Allocated |
   |----------------------- 
|----------:|----------:|-----------:|------:|--------:|-------:|-------:|----------:
   | ArrayPoolCreate        | 739.13 ns | 40.097 ns | 117.597 ns |  1.03 |    
0.23 | 0.6428 | 0.0172 |    **8072 B** |
   
   There is several options how to solve this unnecessary allocation:
   
   ### Limit the pool size
   This is the simplest change we can do but not perfect. As the pool is used 
only for arrays of size 4 or 8, we don't need to cfreate full `ArrayPool`, but 
limit it by parameters:  `ArrayPool<byte>.Create(8, 50)`
   Here is comparison of full and limited pool:
   
   | Method                 | Mean      | Error     | StdDev     | Ratio | 
RatioSD | Gen0   | Gen1   | Allocated | Alloc Ratio |
   |----------------------- 
|----------:|----------:|-----------:|------:|--------:|-------:|-------:|----------:|------------:|
   | ArrayPoolCreate        | 739.13 ns | 40.097 ns | 117.597 ns |  1.03 |    
0.23 | 0.6428 | 0.0172 |    8072 B |        1.00 |
   | ArrayPoolCreateSmall   |  68.02 ns |  3.041 ns |   8.823 ns |  0.09 |    
0.02 | 0.0414 |      - |     520 B |        0.06 |
   
   ### Use ArrayPool<byte>.Shared
    
   Actually there is no reason to create our own array pool because the 
`Stream.Write(ReadOnlySpan<byte> buffer)` becaue it rent from 
ArrayPool<byte>.Shared anyway in the default behaviour: 
https://github.com/dotnet/runtime/blob/b6127f9c7f6bab00186ec43d4a332053a1d02325/src/libraries/System.Private.CoreLib/src/System/IO/Stream.cs#L912-L924
   
   ### Don't use Array pool at all
   
   As the array pool is used only in `WriteIpcMessageLengthAsync` and 
`WriteIpcMessageLengthAsync` it should be enought to have shared array used as 
a buffer. This would imply the ArrowStreamWriter is not thread safe.
   
   
   
   ### Component(s)
   
   C#


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to