[ 
https://issues.apache.org/jira/browse/LUCENE-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17482113#comment-17482113
 ] 

Julie Tibshirani commented on LUCENE-10382:
-------------------------------------------

What do you think about breaking it into two steps? These seem okay to ship on 
their own.
1. Joel's PR, plus a very simple fallback strategy. In the query we could check 
if the bit set would exclude more than 85% of documents, and if so, use an 
exact scan instead. Based on my experiments with random filters, 85% is 
conservative, and we're unlikely to see a bad degradation at that point. In the 
worst case, we do an exact scan when we didn't need to and check 15% of 
documents. We could document caveats like Mike mentions.
2. Switch from a static check to a more robust one (maybe adaptive). I have 
some ideas here I'm excited to try out :) 

> Allow KnnVectorQuery to operate over a subset of liveDocs
> ---------------------------------------------------------
>
>                 Key: LUCENE-10382
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10382
>             Project: Lucene - Core
>          Issue Type: Improvement
>    Affects Versions: 9.0
>            Reporter: Joel Bernstein
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently the KnnVectorQuery selects the top K vectors from all live docs.  
> This ticket will change the interface to make it possible for the top K 
> vectors to be selected from a subset of the live docs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to