[ 
https://issues.apache.org/jira/browse/LUCENE-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479014#comment-17479014
 ] 

Michael Sokolov commented on LUCENE-10382:
------------------------------------------

How would this look? An easy first step is to add a filter parameter to 
KnnVectorQuery 

{{  public KnnVectorQuery(String field, float[] target, int k, Bits filter)}}

then it can call {{LeafReader.searchNearestVectors}} with 
{{liveDocs.intersect(filter)}} instead of {{liveDocs.}}

[~julietibs] shared on list a link to a paper showing how the search 
degenerates for highly selective filters. The writers' approach was to fall 
back to "brute force" KNN when selectivity passes a fixed threshold. We could 
do that too, and it makes sense to me, but I guess the question is: where 
should this fallback happen in the API?

The implementation of full (non-approximate) KNN (with a filter) only needs the 
VectorValues iterator which the KnnVectorsReader already provides. It could be 
implemented as part of KnnVectorQuery. Is there a better place?

> Allow KnnVectorQuery to operate over a subset of liveDocs
> ---------------------------------------------------------
>
>                 Key: LUCENE-10382
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10382
>             Project: Lucene - Core
>          Issue Type: Improvement
>    Affects Versions: 9.0
>            Reporter: Joel Bernstein
>            Priority: Major
>
> Currently the KnnVectorQuery selects the top K vectors from all live docs.  
> This ticket will change the interface to make it possible for the top K 
> vectors to be selected from a subset of the live docs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to