Re: [I] Refactor QueryCache to improve concurrency and performance [lucene]

via GitHub Thu, 20 Feb 2025 17:51:22 -0800


sgup432 commented on issue #14222:
URL: https://github.com/apache/lucene/issues/14222#issuecomment-2673153965


   I got busy with other stuff but got sometime to run initial benchmark for 
this.
   
   I essentially micro-benchmarked `putIfAbsent()` and `get()`methods in 
isolation for QueryCache for simplicity. Here is the benchmark 
[code](https://github.com/sgup432/lucene/blob/query_cache_test/lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/QueryCacheBenchmark.java).
  It basically creates sample queries(10000) and cacheHelpers(assuming 16 
lucene segments).
   
   
   I created a 
[LRUQueryCacheV2](https://github.com/sgup432/lucene/blob/query_cache_test/lucene/core/src/java/org/apache/lucene/search/LRUQueryCacheV2.java),
 with things recommended above. It creates 16(for this test) QueryCacheSegments 
with each having its own in-memory 
[map](https://github.com/sgup432/lucene/blob/query_cache_test/lucene/core/src/java/org/apache/lucene/search/LRUQueryCacheV2.java#L170)
 to store composite key and value. Composite key is nothing but a combination 
of `(CacheKey, Query)`, and it uses its hashcode() to determine which partition 
it will end up going. Rest its pretty similar to existing 
[LRUQueryCache](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java).
   
   Some parts of eviction logic is yet to be fully written for V2, like 
clearing entires when a lucene segment is merged etc. Also ran existing UT for 
QueryCache on top of LRUQueryCacheV2 for high level correctness, some 16 are 
passing and 10 failing(basically due to incomplete eviction logic).
   
   Coming to results:
   
   Here v1 refers to existing QueryCache and v2 refers to my version of 
QueryCache.
   Benchmarks can be run using: `java --module-path 
lucene/benchmark-jmh/build/benchmarks --module org.apache.lucene.benchmark.jmh 
QueryCacheBenchmark`
   
   ## Performance Comparison: v1 vs. v2
   
   | **Benchmark**                            | **Version** | **Throughput 
(ops/s)** | **Error (ops/s)** | **Performance Gain (v2 vs. v1)** |
   
|-------------------------------------------|------------|------------------------|--------------------|---------------------------|
   | **Concurrent Get & Put (Mixed Load)**     |            |                   
     |                    |                           |
   | `concurrentGetAndPuts`                    | v1         | **1,857,864**     
     | ±57,408            | **3.02x**                  |
   | `concurrentGetAndPuts_v2`                 | v2         | **5,614,289**     
     | ±96,352            |                           |
   | **Get Performance (Read-Only in Mixed load)**           |            |     
                   |                    |                           |
   | `concurrentGetAndPuts_get`                | v1         | **814,891**       
     | ±75,165            | **5.27x**                  |
   | `concurrentGetAndPuts_getV2`              | v2         | **4,298,377**     
     | ±114,633           |                           |
   | **Put Performance (Write-Only in Mixed Load)** |       |                   
     |                    |                           |
   | `concurrentGetAndPuts_put`                | v1         | **1,042,973**     
     | ±49,868            | **1.26x**                  |
   | `concurrentGetAndPuts_putV2`              | v2         | **1,315,912**     
     | ±32,133            |                           |
   | **Concurrent Puts (Write-Only Load)**     |            |                   
     |                    |                           |
   | `concurrent_puts_v1`                      | v1         | **1,387,740**     
     | ±35,309            | **2.83x**                  |
   | `concurrent_puts_v2`                      | v2         | **3,933,324**     
     | ±58,046            |                           |
   
   
   
   Raw results:
   ```
   Benchmark                                                                
Mode  Cnt        Score        Error  Units
   QueryCacheBenchmark.concurrentGetAndPuts                                
thrpt   25  1857864.371 ±  57408.178  ops/s
   QueryCacheBenchmark.concurrentGetAndPuts:concurrentGetAndPuts_get       
thrpt   25   814891.042 ±  75165.491  ops/s
   QueryCacheBenchmark.concurrentGetAndPuts:concurrentGetAndPuts_put       
thrpt   25  1042973.329 ±  49868.486  ops/s
   QueryCacheBenchmark.concurrentGetAndPuts_v2                             
thrpt   25  5614289.356 ±  96352.346  ops/s
   QueryCacheBenchmark.concurrentGetAndPuts_v2:concurrentGetAndPuts_getV2  
thrpt   25  4298377.070 ± 114633.945  ops/s
   QueryCacheBenchmark.concurrentGetAndPuts_v2:concurrentGetAndPuts_putV2  
thrpt   25  1315912.286 ±  32133.146  ops/s
   QueryCacheBenchmark.concurrent_puts_v1                                  
thrpt   25  1387740.110 ±  35309.681  ops/s
   QueryCacheBenchmark.concurrent_puts_v2                                  
thrpt   25  3933324.449 ±  58046.222  ops/s
   ```
   
   I only assumed 16 lucene segments for this test which is less for a 
OpenSearch node with multiple indices. With more, we will see more 
improvements. Also eviction wrt segment merges will be handled on a separate 
thread for v2 which is unaccounted for, but even with that, it should be highly 
performant.
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Refactor QueryCache to improve concurrency and performance [lucene]

Reply via email to