aruggero opened a new pull request, #16082:
URL: https://github.com/apache/lucene/pull/16082
## Summary
This PR adds a new JMH microbenchmark for
`DiversifyingChildrenFloatKnnVectorQuery` (the join-based parent-child KNN
query), which was previously lacking dedicated performance coverage in
`lucene/benchmark-jmh`.
## Motivation
`DiversifyingChildrenFloatKnnVectorQuery` operates over a nested document
structure (child vectors + parent block). A dedicated benchmark enables
measurement and tracking of query latency across realistic corpus shapes and
query configurations.
This would also give the basis for evaluating future performance
improvements in nested KNN search.
## What the benchmark does
The `DiversifyingChildrenKnnQueryBenchmark` benchmark builds an index of
parent–child document blocks, where each parent owns a configurable number of
child documents, each carrying a random float vector. A pool of 256
pre-generated unit query vectors is rotated during measurement to avoid caching
effects.
The following parameters are benchmarked:
| Parameter | Values | Description |
| ---------- | ------- | ------------|
| numParents | 5000 | Total number of parent groups |
| childrenPerParent | 4, 50 | Children per parent; controls filter
selectivity |
| k | 10, 100 | Number of top results requested
|
| dim | 128, 768 | Vector dimension
|
The benchmark uses SampleTime mode (5 warm-up iterations, 5 measurement
iterations).
## Setting
**BenchmarkMode(Mode.SampleTime)**: rather than averaging all measurements
into a single number, JMH records the latency of every individual operation and
computes a histogram. This gives you p50, p90, p99, p99.9 automatically.
For a search benchmark, this matters: HNSW graph traversal has
variable-length paths (some queries terminate early, some explore more nodes),
so the mean alone is misleading. Percentiles tell you whether improvements are
consistent or only in the best case.
**Warmup(iterations = 5, time = 2)**: the JVM's JIT compiler needs to
observe a method being called thousands of times before it applies the most
aggressive optimisations.
HNSW traversal involves polymorphic call sites, priority queue operations,
and BitSet accesses — complex enough that JIT convergence takes longer than
simpler benchmarks.
5 iterations × 2s = 10s gives the JIT enough invocations to fully optimise
the hot paths before measurement begins.
**Measurement(iterations = 5, time = 5)**: more time per iteration means
more samples collected per iteration (since SampleTime records every call).
5×5s per fork × 1 fork = 25s of samples per combination, which gives enough
data points for JMH to compute reliable percentile estimates for fast
combinations; tail percentiles for the heaviest combinations (high dim, many
children) remain approximate due to longer per-query latency.
## How to run
```
./gradlew -p lucene/benchmark-jmh assemble
java -jar lucene/benchmark-jmh/build/benchmarks/lucene-benchmark-jmh-*.jar
DiversifyingChildrenKnnQueryBenchmark
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]