[ https://issues.apache.org/jira/browse/LUCENE-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479227#comment-17479227 ]
Adrien Grand commented on LUCENE-10382: --------------------------------------- We have queries like ParentChildrenBlockJoinQuery that take a {{BitSetProducer}} so that the query doesn't have to care about producing bit sets from queries, it's not its responsibility. In this case though, I think the decision should happen in Lucene, since it hopefully has more data to make the right decision than users have (e.g. Scorer#cost). If users knew in advance what filters they would like to apply, then they should split their indexes based on these filters instead of passing filters to Lucene. I wonder if we could develop a cost model of both approaches assuming a random distribution of deletions, filter matches and vector values, so that the query could compute the cost in both cases for specific values of {{{}k{}}}, {{Scorer#cost}} and {{LeafReader#numDeletedDocs}} (and maybe some HNSW-specific parameters like beamWidth?), and pick the approach that has the lesser cost. LRUQueryCache already happens to cache dense filters (cost > maxdoc / 100) as bit sets, which helps with conjunctions for instance, so maybe we would be able to reuse it as a way to avoid recomputing bitsets over and over again for popular filters. > Allow KnnVectorQuery to operate over a subset of liveDocs > --------------------------------------------------------- > > Key: LUCENE-10382 > URL: https://issues.apache.org/jira/browse/LUCENE-10382 > Project: Lucene - Core > Issue Type: Improvement > Affects Versions: 9.0 > Reporter: Joel Bernstein > Priority: Major > > Currently the KnnVectorQuery selects the top K vectors from all live docs. > This ticket will change the interface to make it possible for the top K > vectors to be selected from a subset of the live docs. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org