Re: [I] Performance difference between files getting opened with IOContext.RANDOM vs IOContext.READ during merges [lucene]

via GitHub Fri, 18 Oct 2024 01:00:25 -0700


uschindler commented on issue #13920:
URL: https://github.com/apache/lucene/issues/13920#issuecomment-2421745757


   > @uschindler On a high level it makes sense to me. I have a couple of 
questions so that I understand this better
   > 
   > > Add a method to Indexinput to change the IOContext (without cloning it),
   > 
   > I am interested in understanding when its appropriate to clone. Based on 
the 
[javadoc](https://github.com/apache/lucene/blob/350de210c3674566293681bb58e801629b5ceee7/lucene/core/src/java/org/apache/lucene/store/IndexInput.java#L22-L39)
 for IndexInput, for multithreaded use IndexInput must be cloned. My 
understanding is that merges will have a separate thread. Since the Readers are 
pooled and the same instance is used - I cloned it when I was trying to 
[benchmark the 
solution](https://github.com/shatejas/lucene/commit/4de387288d70b4d8aede45ef3095ae6c1e189331#diff-e0a29611df21f6d32a461e2d24db1585cdf3a8590f08d93b097f0dd84684ebc8R316).
 Would appreciate if you can give insights on why the indexInput shouldn't be 
cloned in this case.
   
   Cloning a reader wont clone any input which is a fully different thing. 
Don't care about cloning or not for implementing this issue. The Javadocs and 
usage pattern (when to clone) is more about stateful use of IndexInputs (they 
have a read position which can't be updated from multiple threads). If you are 
about pure random access, you can get a view on it as RandomAccessInput (which 
is sometimes used for Lucene). In all cases: The cloned inputs use exactly the 
same MemorySegments behind scenes (in former days it was ByteBuffers, those 
were also duped for clones).
   
   What I wanted to say is: When you change the read advice, it will affect all 
clones, too. Therefor it is not needed to create a clone of the IndexInput. So 
basically it simplifies thigs: The CodecReader that is used for merging (and 
used at same time also for searching) can just be instucted to change the read 
advice on its backing IndexInput. Thats relatively simple to implenment and 
won't affect the current behaviour how merging works.
   
   > > "revert to normal use"
   > 
   > In the [benchmarking 
code](https://github.com/shatejas/lucene/commit/4de387288d70b4d8aede45ef3095ae6c1e189331#diff-e0a29611df21f6d32a461e2d24db1585cdf3a8590f08d93b097f0dd84684ebc8R316),
 I did not revert it back thinking the reader will be closed and a new reader 
will be opened with the intended IOContext in this case random. Would you be 
able to share insights on reverting it back considering there will be new 
reader.
   
   As this is just an advise: When done with merging, just revert back. Opening 
new readers is too expensive and mostly not useful. In addition, you can't 
control if searchers using the index in parallel reopen the readers soon. 
@jpountz already discussed that. This largely depends on how often you reopen 
IndexReaders. So in general it would be a good idea to revert back to the 
original state if the IndexReader is still used for longer time. This really 
depends on the usage pattern.
   
   Reverting to normal use is as simple to implement as the initial change. 
Just change the advice on any of the open IndexInputs, no matter if it is 
shared by multiple readers and revert back at end. Thats totally uncoupled from 
the internals of IndexReaders.
   
   Uwe


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Performance difference between files getting opened with IOContext.RANDOM vs IOContext.READ during merges [lucene]

Reply via email to