[PR] DiversifyingChildren benchmark [lucene]

via GitHub Mon, 18 May 2026 07:11:46 -0700


aruggero opened a new pull request, #16082:
URL: https://github.com/apache/lucene/pull/16082


   ## Summary
   This PR adds a new JMH microbenchmark for 
`DiversifyingChildrenFloatKnnVectorQuery` (the join-based parent-child KNN 
query), which was previously lacking dedicated performance coverage in 
`lucene/benchmark-jmh`.
   
   ## Motivation
   `DiversifyingChildrenFloatKnnVectorQuery` operates over a nested document 
structure (child vectors + parent block). A dedicated benchmark enables 
measurement and tracking of query latency across realistic corpus shapes and 
query configurations.
   This would also give the basis for evaluating future performance 
improvements in nested KNN search.
     
   ## What the benchmark does
   The `DiversifyingChildrenKnnQueryBenchmark` benchmark builds an index of 
parent–child document blocks, where each parent owns a configurable number of 
child documents, each carrying a random float vector. A pool of 256 
pre-generated unit query vectors is rotated during measurement to avoid caching 
effects.
   
   The following parameters are benchmarked:
   | Parameter | Values | Description |
   | ---------- | ------- | ------------|
   | numParents | 5000 | Total number of parent groups |
   | childrenPerParent | 4, 50 | Children per parent; controls filter 
selectivity |
   | k                 | 10, 100  | Number of top results requested             
     |
   | dim             | 128, 768 |  Vector dimension                             
            |
   
   The benchmark uses SampleTime mode (5 warm-up iterations, 5 measurement 
iterations).
   
   ## Setting
   **BenchmarkMode(Mode.SampleTime)**: rather than averaging all measurements 
into a single number, JMH records the latency of every individual operation and 
computes a histogram. This gives you p50, p90, p99, p99.9 automatically. 
   For a search benchmark, this matters: HNSW graph traversal has 
variable-length paths (some queries terminate early, some explore more nodes), 
so the mean alone is misleading. Percentiles tell you whether improvements are 
consistent or only in the best case.
   
   **Warmup(iterations = 5, time = 2)**: the JVM's JIT compiler needs to 
observe a method being called thousands of times before it applies the most 
aggressive optimisations. 
   HNSW traversal involves polymorphic call sites, priority queue operations, 
and BitSet accesses — complex enough that JIT convergence takes longer than 
simpler benchmarks. 
   5 iterations × 2s = 10s gives the JIT enough invocations to fully optimise 
the hot paths before measurement begins.
     
   **Measurement(iterations = 5, time = 5)**: more time per iteration means 
more samples collected per iteration (since SampleTime records every call). 
5×5s per fork × 1 fork = 25s of samples per combination, which gives enough 
data points for JMH to compute reliable percentile estimates for fast 
combinations; tail percentiles for the heaviest combinations (high dim, many 
children) remain approximate due to longer per-query latency.
   
   ## How to run
   ```
   ./gradlew -p lucene/benchmark-jmh assemble
   java -jar lucene/benchmark-jmh/build/benchmarks/lucene-benchmark-jmh-*.jar 
DiversifyingChildrenKnnQueryBenchmark
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] DiversifyingChildren benchmark [lucene]

Reply via email to