kaivalnp commented on code in PR #932: URL: https://github.com/apache/lucene/pull/932#discussion_r888896850
########## lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java: ########## @@ -225,6 +225,11 @@ public BitSetIterator getIterator(int contextOrd) { return new BitSetIterator(bitSets[contextOrd], cost[contextOrd]); } + public void setBitSet(BitSet bitSet, int cost) { + bitSets[ord] = bitSet; Review Comment: If we use the low-level `searchNearestVectors`, we won't automatically switch to `exactSearch` on reaching the limit of nodes visited (we can still duplicate the search code from KnnVectorQuery to achieve this) And possibly some issues addressed [here](https://issues.apache.org/jira/browse/LUCENE-10504) as well? ___ I tested two other approaches: Using Reflection: We update the variables forcefully using reflect. It has an advantage of not modifying classes for test purposes, but it makes the test sort of hacky, and might break silently with further changes to `KnnVectorQuery` selectivity | effective topK | post-filter recall | post-filter time | pre-filter recall | pre-filter time -- | -- | -- | -- | -- | -- 0.8 | 125 | 0.966 | 1.57 | 0.976 | 1.62 0.6 | 166 | 0.961 | 1.94 | 0.981 | 1.97 0.4 | 250 | 0.958 | 2.68 | 0.986 | 2.64 0.2 | 500 | 0.961 | 4.78 | 0.992 | 4.51 0.1 | 1000 | 0.956 | 8.54 | 0.995 | 7.78 0.01 | 10000 | 0.979 | 58.12 | 1.000 | 9.84 ___ Modifying Collection: Instead of collecting hit-by-hit using a `LeafCollector`, we can break down the search by instantiating a weight, creating scorers, and finally calling `BitSet.of` on it's iterator. If it is backed by a `BitSet`, the collection is optimized (this can be advantageous as `LRUQueryCache` internally uses a `BitSet`, so such iterators will be common) Not completely sure of this one, looking for suggestions. Sample [code](https://github.com/apache/lucene/compare/main...kaivalnp:alternate_collection) selectivity | effective topK | post-filter recall | post-filter time | pre-filter recall | pre-filter time -- | -- | -- | -- | -- | -- 0.8 | 125 | 0.961 | 1.56 | 0.976 | 1.64 0.6 | 166 | 0.960 | 1.93 | 0.980 | 1.98 0.4 | 250 | 0.961 | 2.68 | 0.987 | 2.68 0.2 | 500 | 0.960 | 4.76 | 0.991 | 4.55 0.1 | 1000 | 0.961 | 8.50 | 0.995 | 7.84 0.01 | 10000 | 0.953 | 58.17 | 1.000 | 9.71 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org