[ https://issues.apache.org/jira/browse/LUCENE-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479014#comment-17479014 ]
Michael Sokolov commented on LUCENE-10382: ------------------------------------------ How would this look? An easy first step is to add a filter parameter to KnnVectorQuery {{ public KnnVectorQuery(String field, float[] target, int k, Bits filter)}} then it can call {{LeafReader.searchNearestVectors}} with {{liveDocs.intersect(filter)}} instead of {{liveDocs.}} [~julietibs] shared on list a link to a paper showing how the search degenerates for highly selective filters. The writers' approach was to fall back to "brute force" KNN when selectivity passes a fixed threshold. We could do that too, and it makes sense to me, but I guess the question is: where should this fallback happen in the API? The implementation of full (non-approximate) KNN (with a filter) only needs the VectorValues iterator which the KnnVectorsReader already provides. It could be implemented as part of KnnVectorQuery. Is there a better place? > Allow KnnVectorQuery to operate over a subset of liveDocs > --------------------------------------------------------- > > Key: LUCENE-10382 > URL: https://issues.apache.org/jira/browse/LUCENE-10382 > Project: Lucene - Core > Issue Type: Improvement > Affects Versions: 9.0 > Reporter: Joel Bernstein > Priority: Major > > Currently the KnnVectorQuery selects the top K vectors from all live docs. > This ticket will change the interface to make it possible for the top K > vectors to be selected from a subset of the live docs. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org