kaivalnp opened a new pull request, #12590: URL: https://github.com/apache/lucene/pull/12590
### Context Vector search is performed in [`AbstractKnnVectorQuery`](https://github.com/kaivalnp/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java), where individual HNSW searches are delegated to sub-classes via [`#approximateSearch`](https://github.com/kaivalnp/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L174). This is useful to implement custom functionality (like say pro-rating `k` across segments for performance, or requesting some additional results over `k` for higher recall with Exact KNN) While the class in itself is `package-private`, we extend the corresponding sub-classes for [byte](https://github.com/kaivalnp/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/KnnByteVectorQuery.java) and [float](https://github.com/kaivalnp/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/KnnFloatVectorQuery.java) vectors (which are `public`) for implementing any custom functionality like above ### Issue After searching across all segments, we retain the index-level `topk` results [here](https://github.com/kaivalnp/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L88), and immediately call [`#createRewrittenQuery`](https://github.com/kaivalnp/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L219) to rewrite them as a [`DocAndScoreQuery`](https://github.com/kaivalnp/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L297) So the implementing classes do not have access to the final `topK` results anywhere, which may be useful to read / modify like a post-process step (for example metric emission, or counting / only keeping results above a threshold) ### Proposal Can we make [`#createRewrittenQuery`](https://github.com/kaivalnp/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L219) `protected` to allow sub-classes to override it (and ultimately access the `topK` results)? They can simply delegate to the original function at the end, or even implement a custom `Query` if required I don't how common of a use-case this is, and wanted to get some opinions Closes #12575 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org