[ https://issues.apache.org/jira/browse/LUCENE-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17482113#comment-17482113 ]
Julie Tibshirani commented on LUCENE-10382: ------------------------------------------- What do you think about breaking it into two steps? These seem okay to ship on their own. 1. Joel's PR, plus a very simple fallback strategy. In the query we could check if the bit set would exclude more than 85% of documents, and if so, use an exact scan instead. Based on my experiments with random filters, 85% is conservative, and we're unlikely to see a bad degradation at that point. In the worst case, we do an exact scan when we didn't need to and check 15% of documents. We could document caveats like Mike mentions. 2. Switch from a static check to a more robust one (maybe adaptive). I have some ideas here I'm excited to try out :) > Allow KnnVectorQuery to operate over a subset of liveDocs > --------------------------------------------------------- > > Key: LUCENE-10382 > URL: https://issues.apache.org/jira/browse/LUCENE-10382 > Project: Lucene - Core > Issue Type: Improvement > Affects Versions: 9.0 > Reporter: Joel Bernstein > Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently the KnnVectorQuery selects the top K vectors from all live docs. > This ticket will change the interface to make it possible for the top K > vectors to be selected from a subset of the live docs. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org