kaivalnp opened a new pull request, #12590:
URL: https://github.com/apache/lucene/pull/12590

   ### Context
   
   Vector search is performed in 
[`AbstractKnnVectorQuery`](https://github.com/kaivalnp/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java),
 where individual HNSW searches are delegated to sub-classes via 
[`#approximateSearch`](https://github.com/kaivalnp/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L174).
 This is useful to implement custom functionality (like say pro-rating `k` 
across segments for performance, or requesting some additional results over `k` 
for higher recall with Exact KNN)
   
   While the class in itself is `package-private`, we extend the corresponding 
sub-classes for 
[byte](https://github.com/kaivalnp/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/KnnByteVectorQuery.java)
 and 
[float](https://github.com/kaivalnp/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/KnnFloatVectorQuery.java)
 vectors (which are `public`) for implementing any custom functionality like 
above
   
   ### Issue
   
   After searching across all segments, we retain the index-level `topk` 
results 
[here](https://github.com/kaivalnp/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L88),
 and immediately call 
[`#createRewrittenQuery`](https://github.com/kaivalnp/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L219)
 to rewrite them as a 
[`DocAndScoreQuery`](https://github.com/kaivalnp/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L297)
   
   So the implementing classes do not have access to the final `topK` results 
anywhere, which may be useful to read / modify like a post-process step (for 
example metric emission, or counting / only keeping results above a threshold)
   
   ### Proposal
   
   Can we make 
[`#createRewrittenQuery`](https://github.com/kaivalnp/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L219)
 `protected` to allow sub-classes to override it (and ultimately access the 
`topK` results)?
   They can simply delegate to the original function at the end, or even 
implement a custom `Query` if required
   
   I don't how common of a use-case this is, and wanted to get some opinions
   
   Closes #12575


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to