iprithv opened a new pull request, #16047:
URL: https://github.com/apache/lucene/pull/16047

   ## Description
   
   This implements a proof of concept for sibling scoring in block join 
diversified child vector search, discussed in [#15839 — Maybe Improve join 
block Vector search performance by block scoring child 
vectors](https://github.com/apache/lucene/issues/15839).
   
   Today, DiversifyingChildrenFloatKnnVectorQuery / 
DiversifyingChildrenByteKnnVectorQuery can return a child vector that reached 
HNSW first for a parent, even when another sibling in the same parent block has 
higher similarity. This PR adds a post HNSW rescoring pass on the approximate 
path only: for each provisional hit, iterate all live child doc ids in that 
parent’s block (in ascending doc order so VectorScorer stays forward‑only), 
re‑score with the real VectorScorer, keep the best child, then re‑sort by 
score. acceptDocs is respected. Exact search is unchanged.
   
   This is intentionally smaller than the issue’s exploratory design (collector 
driving scoring during traversal, visited tracking, richer aggregates like 
min/max/top‑n per parent). It is meant to demonstrate correctness within the 
block for parents already surfaced by HNSW, with measurable overhead.
   
   ---
   
   ## Benchmarks (JMH)
   
   | children / parent | Baseline | + block rescore |
   |-------------------|---------|----------------|
   | 16 | ~0.17 ms/op | ~0.22 ms/op |
   | 64 | ~0.26 ms/op | ~0.42 ms/op |
   
   Overhead **scales with block width** (and with `topK`, since more blocks are 
rescored). The change trades CPU for correctness within each parent block.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to