Re: [I] Performance difference between files getting opened with IOContext.RANDOM vs IOContext.READ during merges [lucene]

via GitHub Thu, 17 Oct 2024 03:50:36 -0700


uschindler commented on issue #13920:
URL: https://github.com/apache/lucene/issues/13920#issuecomment-2419196663


   Thanks for opening the issue. I already made similar suggestion in another 
PR and also the mailing list.
   
   I'd go the route and temporarily change the IOContext to SEQUENTIAL. This 
may of course slow down random reads, but on the other hand once the whole file 
is merged away (and was therefor read) it should be in FS cache anyways. If 
not, you have too less memory, like @s1monw says: "Add more RUM" :-)
   
   Users of the old segment which was merged away will only use it till the 
next IndexReader reopen, soby signaling that we read it only once it's a good 
idea to get rid of it from cache soon.
   
   So my proposal is:
   - Add a method to Indexinput to change the IOContext, but document it in a 
valid way that all clones or slices opened at same time are also affected.
   - Before merging of segments, we should add a hook to the codec so it can 
call some special method on the incoming CodecReader to "make it ready for 
merging" and "revert to normal use". This could instruct the codec to apply 
different madvise advices or restore them. I am not sure what the best API for 
that is, was just a quick idea (haven't looked at the different codec 
components). In general the hooks should be available for all codecs 
components, not only DocValues and Vectors. Because also merging of stored 
fields may be improved by switching to SEQUENTIAL to to higher read-ahead and 
less paging requests in kernel.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Performance difference between files getting opened with IOContext.RANDOM vs IOContext.READ during merges [lucene]

Reply via email to