benwtrent opened a new issue, #15839:
URL: https://github.com/apache/lucene/issues/15839

   ### Description
   
   Besides the overall performance improvements that could be done for HNSW & 
block join queries, I think there are ways for us to improve the vector story 
as a whole.
   
   A major one is scoring ALL child docs for a nearest parent doc at a time. 
When we score candidates, we score the candidates together that are children of 
the same parent, and ALSO score ALL children for that common parent. 
   
   This would be complicated, a POC would likely be required to prove if its 
useful, but it would allow us to:
   
    - Bulk score all matching children in a block (super fast, locality on 
disk, etc.)
    - Bulk collect the children all within a parent, keeping the translation 
times simple
    - It MAY increase the scoring count (e.g. now for a single node in the 
graph, we might score 10s or 100s of vectors :/), so maybe its something that 
only occurs once we get further into the graph...
   
   
   Its logical that children are all near each other in the graph.
   
   This will be a pretty large digression in the API design. The KnnCollector 
would need to:
   
    - Provide the scoring logic (but not the score methodology)
    - Keep track of "visited" nodes
   
   This would also give some neat augmentations, like the ability to return the 
average score, min score, max score for a parent and more than the single 
nearest vector (e.g. could return the top 5 or whatever).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to