sgup432 commented on PR #15558:
URL: https://github.com/apache/lucene/pull/15558#issuecomment-4121884743

   I would like some help in reviewing this PR. We have seen multiple 
performance issues in the current implementation of query cache. And this PR 
would resolve all those issues plus make it much better.
   
   
   One of the major issues we have in production is the high wait lock times:
   
   <img width="813" height="53" alt="JFR_high_lock_1" 
src="https://github.com/user-attachments/assets/613ed452-bab3-4e2f-a921-adc4dda557b3";
 />
   <img width="813" height="194" alt="JFR_high_lock" 
src="https://github.com/user-attachments/assets/f19726ea-ebf7-4b80-a46b-8ba1c562a1c6";
 />
   
   
   This was one of the JFR dump taken for a particular OpenSearch customer who 
was facing tail latency spikes. We have seen multiple instances of the same 
issue.
   
   The profile was taken for 5min duration and on one of their Opensearch data 
node. This shows the total lock wait time(>7mins across 97 distinct search 
threads) on this synchronized map get call which is coming from the original 
implementation of query cache.
   
   Pinging folks to see if they can help here. 
   @msfroh @benwtrent 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to