uschindler opened a new pull request, #14928:
URL: https://github.com/apache/lucene/pull/14928

   Currently the DirectIODircetory allocates direct byte buffers outside of 
heap (because that's needed for direct IO to work). It also needs to align them 
on the blocksize. The current code may also be wrong if the mergeBufferSize is 
not a multiple of blockSize. This PR fixes that to have a correct buffer 
aligned and with correct length.
   
   With MemorySegments we can improve that:
   - There is a direct allocator method that takes care that a MemorySegment is 
allocated with correct alignment. In contrast to ByteBuffers the length is not 
aligned, so we have to take care (I added code in ctor to have correct 
multiplies of blockSize).
   - We can convert this MemorySegment to a direct buffer with 
`MemorySegment#asByteBuffer()`. The resulting segment is compatible to direct 
IO.
   - We can free the buffer using an Arena at the correct time, when closing 
the output or input.
   
   As IndexOutputs are only used by one thread we can use a confined arena and 
allocate the buffer there.
   
   With IndexOutputs it is more complicated: Theoretically they should also 
only be used from one thread (also RandomAccessInputs as far as I remember), 
but unfortunately the buffer is allocated at the time of cloning (which is not 
the thread when it is used). Actually the buffering code is a bit cryptic to me 
and I had no time to look closely into it: Actually like in BufferedIndexInput 
the buffer should be lazy initialized on the first real READ access (not on 
cloning and not on seeking for first time after cloning). To implement this 
correctly we may need to refactor the buffer code a bit.
   
   Therefore in this mockup I use an AUTO arena which make the buffer freed by 
garbage collector. A shared arena is too expensive.
   
   If you have an idea how to fix the IndexInput to use a lazy buffer like 
BufferedIndexInput without mixing everything up, tell me. The buffer should be 
confined and allocated only from the thread actually using the clone. An 
alternative is to have a pool of buffers for reuse "per thread" (threadlocal). 
The JDK internally uses a ThreadLocal for such buffers when implementing 
IndexInput.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to