sgup432 commented on code in PR #15124:
URL: https://github.com/apache/lucene/pull/15124#discussion_r2357058701
##########
lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java:
##########
@@ -426,10 +425,8 @@ public void clear() {
}
private static long getRamBytesUsed(Query query) {
- return LINKED_HASHTABLE_RAM_BYTES_PER_ENTRY
- + (query instanceof Accountable accountableQuery
- ? accountableQuery.ramBytesUsed()
- : QUERY_DEFAULT_RAM_BYTES_USED);
+ long queryRamBytesUsed = RamUsageEstimator.sizeOf(query, 0);
+ return LINKED_HASHTABLE_RAM_BYTES_PER_ENTRY + queryRamBytesUsed;
Review Comment:
@benwtrent So I wrote a simple code to micro-benchmark `putIfAbsent` method
which is the one which calls `getRamBytesUsed(Query query)` during caching and
eviction.
Created a cache with `MAX_SIZE = 10000` and `MAX_SIZE_IN_BYTES = 1048576`,
and created N sample boolean queries and the logic looks like something below
```
for (int i = 0; i < MAX_SIZE; i++) {
TermQuery must = new TermQuery(new Term("foo", "bar" + i));
TermQuery filter = new TermQuery(new Term("foo", "quux" + i));
TermQuery mustNot = new TermQuery(new Term("foo", "foo" + i));
BooleanQuery.Builder bq = new BooleanQuery.Builder();
bq.add(must, BooleanClause.Occur.FILTER);
bq.add(filter, BooleanClause.Occur.FILTER);
bq.add(mustNot, BooleanClause.Occur.MUST_NOT);
queries[i] = bq.build();
}
```
JMH method to test 100% writes workload for 10 threads
```
@Benchmark
@Group("concurrentPutOnly")
@GroupThreads(10)
public void testConcurrentPuts() {
int random = ThreadLocalRandom.current().nextInt(MAX_SIZE);
queryCache.putIfAbsent(
queries[random], this.sampleCacheAndCount, cacheHelpers[random &
(SEGMENTS - 1)]);
}
```
Baseline numbers
```
Benchmark Mode Cnt Score Error
Units
QueryCacheBenchmark.concurrentPutOnly thrpt 15 4102080.220 ± 80816.546
ops/s
```
My changes
```
Benchmark Mode Cnt Score Error
Units
QueryCacheBenchmark.concurrentPutOnly thrpt 15 901925.345 ± 38059.230
ops/s
```
So it became ~4.5x slower. There were lot of evictions as well. Though this
probably one of the worst case scenario with only write workload and boolean
query. For a mixed read/write or simple filter queries, this might be way less.
And this is the method profile (taken from JFR). Kind of expected.
<img width="1498" height="844" alt="Screenshot 2025-09-17 at 3 53 18 PM"
src="https://github.com/user-attachments/assets/0f4e9e80-b8c1-422d-90b9-2c62fce4449c"
/>
I don't know if there is a way out or we can further improve query visitor
logic itself.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]