uschindler commented on issue #13920: URL: https://github.com/apache/lucene/issues/13920#issuecomment-2421745757
> @uschindler On a high level it makes sense to me. I have a couple of questions so that I understand this better > > > Add a method to Indexinput to change the IOContext (without cloning it), > > I am interested in understanding when its appropriate to clone. Based on the [javadoc](https://github.com/apache/lucene/blob/350de210c3674566293681bb58e801629b5ceee7/lucene/core/src/java/org/apache/lucene/store/IndexInput.java#L22-L39) for IndexInput, for multithreaded use IndexInput must be cloned. My understanding is that merges will have a separate thread. Since the Readers are pooled and the same instance is used - I cloned it when I was trying to [benchmark the solution](https://github.com/shatejas/lucene/commit/4de387288d70b4d8aede45ef3095ae6c1e189331#diff-e0a29611df21f6d32a461e2d24db1585cdf3a8590f08d93b097f0dd84684ebc8R316). Would appreciate if you can give insights on why the indexInput shouldn't be cloned in this case. Cloning a reader wont clone any input which is a fully different thing. Don't care about cloning or not for implementing this issue. The Javadocs and usage pattern (when to clone) is more about stateful use of IndexInputs (they have a read position which can't be updated from multiple threads). If you are about pure random access, you can get a view on it as RandomAccessInput (which is sometimes used for Lucene). In all cases: The cloned inputs use exactly the same MemorySegments behind scenes (in former days it was ByteBuffers, those were also duped for clones). What I wanted to say is: When you change the read advice, it will affect all clones, too. Therefor it is not needed to create a clone of the IndexInput. So basically it simplifies thigs: The CodecReader that is used for merging (and used at same time also for searching) can just be instucted to change the read advice on its backing IndexInput. Thats relatively simple to implenment and won't affect the current behaviour how merging works. > > "revert to normal use" > > In the [benchmarking code](https://github.com/shatejas/lucene/commit/4de387288d70b4d8aede45ef3095ae6c1e189331#diff-e0a29611df21f6d32a461e2d24db1585cdf3a8590f08d93b097f0dd84684ebc8R316), I did not revert it back thinking the reader will be closed and a new reader will be opened with the intended IOContext in this case random. Would you be able to share insights on reverting it back considering there will be new reader. As this is just an advise: When done with merging, just revert back. Opening new readers is too expensive and mostly not useful. In addition, you can't control if searchers using the index in parallel reopen the readers soon. @jpountz already discussed that. This largely depends on how often you reopen IndexReaders. So in general it would be a good idea to revert back to the original state if the IndexReader is still used for longer time. This really depends on the usage pattern. Reverting to normal use is as simple to implement as the initial change. Just change the advice on any of the open IndexInputs, no matter if it is shared by multiple readers and revert back at end. Thats totally uncoupled from the internals of IndexReaders. Uwe -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org