[ https://issues.apache.org/jira/browse/LUCENE-10606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551976#comment-17551976 ]
Kaival Parikh commented on LUCENE-10606: ---------------------------------------- Instead of collecting hit-by-hit using a LeafCollector, we can break down the search by instantiating a weight, creating scorers, and checking the underlying iterator. If it is backed by a BitSet, we can directly update the reference (as we won't be editing it). Else we can create a new BitSet from the iterator using BitSet.of This way the collection is optimized (and can be advantageous as LRUQueryCache internally uses a BitSet, so such iterators will be common). Sample [code|https://github.com/apache/lucene/compare/main...kaivalnp:alternate_collection] > Optimize hit collection of prefilter in KnnVectorQuery for BitSet backed > queries > -------------------------------------------------------------------------------- > > Key: LUCENE-10606 > URL: https://issues.apache.org/jira/browse/LUCENE-10606 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search > Reporter: Kaival Parikh > Priority: Minor > Labels: performance > > While working on this [PR|https://github.com/apache/lucene/pull/932] to add > prefilter testing support, we saw that hit collection took a long time for > BitSetIterator backed scorers (due to iteration over the entire underlying > BitSet, and copying it into an internal one) (Link to > [numbers|https://github.com/apache/lucene/pull/932#discussion_r888896850], > second table) > These BitSetIterators can be frequent (as they are used in LRUQueryCache), > and bulk collection can be optimized with more knowledge of the underlying > iterator -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org