sgup432 commented on PR #15558: URL: https://github.com/apache/lucene/pull/15558#issuecomment-4121884743
I would like some help in reviewing this PR. We have seen multiple performance issues in the current implementation of query cache. And this PR would resolve all those issues plus make it much better. One of the major issues we have in production is the high wait lock times: <img width="813" height="53" alt="JFR_high_lock_1" src="https://github.com/user-attachments/assets/613ed452-bab3-4e2f-a921-adc4dda557b3" /> <img width="813" height="194" alt="JFR_high_lock" src="https://github.com/user-attachments/assets/f19726ea-ebf7-4b80-a46b-8ba1c562a1c6" /> This was one of the JFR dump taken for a particular OpenSearch customer who was facing tail latency spikes. We have seen multiple instances of the same issue. The profile was taken for 5min duration and on one of their Opensearch data node. This shows the total lock wait time(>7mins across 97 distinct search threads) on this synchronized map get call which is coming from the original implementation of query cache. Pinging folks to see if they can help here. @msfroh @benwtrent -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
