[ 
https://issues.apache.org/jira/browse/LUCENE-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479227#comment-17479227
 ] 

Adrien Grand commented on LUCENE-10382:
---------------------------------------

We have queries like ParentChildrenBlockJoinQuery that take a 
{{BitSetProducer}} so that the query doesn't have to care about producing bit 
sets from queries, it's not its responsibility. In this case though, I think 
the decision should happen in Lucene, since it hopefully has more data to make 
the right decision than users have (e.g. Scorer#cost). If users knew in advance 
what filters they would like to apply, then they should split their indexes 
based on these filters instead of passing filters to Lucene.

I wonder if we could develop a cost model of both approaches assuming a random 
distribution of deletions, filter matches and vector values, so that the query 
could compute the cost in both cases for specific values of {{{}k{}}}, 
{{Scorer#cost}} and {{LeafReader#numDeletedDocs}} (and maybe some HNSW-specific 
parameters like beamWidth?), and pick the approach that has the lesser cost.

LRUQueryCache already happens to cache dense filters (cost > maxdoc / 100) as 
bit sets, which helps with conjunctions for instance, so maybe we would be able 
to reuse it as a way to avoid recomputing bitsets over and over again for 
popular filters.

> Allow KnnVectorQuery to operate over a subset of liveDocs
> ---------------------------------------------------------
>
>                 Key: LUCENE-10382
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10382
>             Project: Lucene - Core
>          Issue Type: Improvement
>    Affects Versions: 9.0
>            Reporter: Joel Bernstein
>            Priority: Major
>
> Currently the KnnVectorQuery selects the top K vectors from all live docs.  
> This ticket will change the interface to make it possible for the top K 
> vectors to be selected from a subset of the live docs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to