sgup432 opened a new issue, #14999:
URL: https://github.com/apache/lucene/issues/14999

   ### Description
   
   During a recent issue in OpenSearch, we discovered high wait time(>50ms) for 
`IndexReader.registerParentReader` for both query and fetch phase. During this 
time,  OpenSearchLeafReader was having more than >30k parentReader, and with 
many leafReaders(aka segments) per lucene index, this caused significant 
latency for a search request(especially tail latencies).
   
   Each IndexReader appears to tracks its parents using a synchronized set that 
holds weak references. These weak references are cleared during subsequent 
synchronizedSet.add calls if the GC has already reclaimed the associated 
objects. During the time of this issue, the set had grown beyond  > 30k 
entries, and one thread was already holding a lock to clean up stale 
references. As a result, other threads attempting to call  
`SynchronizetSet.add` were blocked, leading to contention and delay.
   
   A sample lock profile looked like below. ~33% of the time being spent in 
trying to add elements to the set.
   
   <img width="1486" height="154" alt="Image" 
src="https://github.com/user-attachments/assets/3d8380c4-e1e7-4133-8ed7-7d7bfde92636";
 />
   
   **More context**:
   In OpenSearch, each query and fetch phase maintains a per-request context by 
wrapping the underlying reader in a NonClosingReaderWrapper. This ensures that 
plugins can safely wrap the reader without risking accidental closure of the 
underlying resource. For every request on a shard, multiple short-lived 
NonClosingReaderWrapper instances (along with other plugin-provided 
IndexReaderWrappers) are created and registered as parents of the 
OpenSearchLeafReader.
   
   This leads to lock contention in `synchronizedSet.add`, especially when 
stale entries in the parentReaders set are being cleared. Under such 
conditions, lock wait times can exceed 100ms, depending on the size of the 
parentReaders set. The issue becomes more pronounced under high throughput per 
shard, where the frequency of reader registration increases significantly.
   
   **Is it a lucene issue?**
   I understand that one could argue this is primarily an OpenSearch issue, 
specifically how it creates and manages per-request reader wrappers. However, I 
believe Lucene should offer a basic mechanism at the IndexReader level to 
exclude certain instances from being tracked as parents, or alternatively, 
improve its current approach. The existing use of a synchronizedSet with weak 
references can lead to performance bottlenecks in such cases. Curious to know 
what lucene folks think about this.
   
   
   **Some solutions I can think of are below** 
   Open to any other ideas.
   
   1.  Have a way to not track certain index readers as parents. Any short 
lived readerWrapper can then have trackAsParent as false and avoid being 
tracked. I did a simple implementation of this, and it improved p99 upto 
~30-40% in some cases.
   ```
   class IndexReader {
     public boolean trackAsParent() {
        return true;
     }
   
      public final void registerParentReader(IndexReader reader) {
         ensureOpen();
          if (reader.trackAsParent()) {
              parentReaders.add(reader);
          }
        }
   }
   ```
   
   2. If not above, consider making `registerParentReader` non-final so its 
behavior can be overriden?
   
   3. Or maybe provide an explicit way to remove parentReader from the set. So 
that once short loved readerWrapper are closed, we can explicitly remove them 
immediately and not allow synchronizedSet to grow in size.  This still might 
have some contention, but will have to test it out.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to