uschindler commented on issue #13920: URL: https://github.com/apache/lucene/issues/13920#issuecomment-2419196663
Thanks for opening the issue. I already made similar suggestion in another PR and also the mailing list. I'd go the route and temporarily change the IOContext to SEQUENTIAL. This may of course slow down random reads, but on the other hand once the whole file is merged away (and was therefor read) it should be in FS cache anyways. If not, you have too less memory, like @s1monw says: "Add more RUM" :-) Users of the old segment which was merged away will only use it till the next IndexReader reopen, soby signaling that we read it only once it's a good idea to get rid of it from cache soon. So my proposal is: - Add a method to Indexinput to change the IOContext, but document it in a valid way that all clones or slices opened at same time are also affected. - Before merging of segments, we should add a hook to the codec so it can call some special method on the incoming CodecReader to "make it ready for merging" and "revert to normal use". This could instruct the codec to apply different madvise advices or restore them. I am not sure what the best API for that is, was just a quick idea (haven't looked at the different codec components). In general the hooks should be available for all codecs components, not only DocValues and Vectors. Because also merging of stored fields may be improved by switching to SEQUENTIAL to to higher read-ahead and less paging requests in kernel. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org