kaivalnp commented on code in PR #932:
URL: https://github.com/apache/lucene/pull/932#discussion_r888896850


##########
lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java:
##########
@@ -225,6 +225,11 @@ public BitSetIterator getIterator(int contextOrd) {
       return new BitSetIterator(bitSets[contextOrd], cost[contextOrd]);
     }
 
+    public void setBitSet(BitSet bitSet, int cost) {
+      bitSets[ord] = bitSet;

Review Comment:
   If we use the low-level `searchNearestVectors`, we won't automatically 
switch to `exactSearch` on reaching the limit of nodes visited (we can still 
duplicate the search code from KnnVectorQuery to achieve this)
   And possibly some issues addressed 
[here](https://issues.apache.org/jira/browse/LUCENE-10504) as well?
   
   ___
   
   I tested two other approaches:
   
   Using Reflection: We update the variables forcefully using reflect. It has 
an advantage of not modifying classes for test purposes, but it makes the test 
sort of hacky, and might break silently with further changes to `KnnVectorQuery`
   
   selectivity | effective topK | post-filter recall | post-filter time | 
pre-filter recall | pre-filter time
   -- | -- | -- | -- | -- | --
   0.8 | 125 | 0.966 | 1.57 | 0.976 | 1.62
   0.6 | 166 | 0.961 | 1.94 | 0.981 | 1.97
   0.4 | 250 | 0.958 | 2.68 | 0.986 | 2.64
   0.2 | 500 | 0.961 | 4.78 | 0.992 | 4.51
   0.1 | 1000 | 0.956 | 8.54 | 0.995 | 7.78
   0.01 | 10000 | 0.979 | 58.12 | 1.000 | 9.84
   
   ___
   
   Modifying Collection: Instead of collecting hit-by-hit using a 
`LeafCollector`, we can break down the search by instantiating a weight, 
creating scorers, and finally calling `BitSet.of` on it's iterator. If it is 
backed by a `BitSet`, the collection is optimized (this can be advantageous as 
`LRUQueryCache` internally uses a `BitSet`, so such iterators will be common)
   Not completely sure of this one, looking for suggestions. Sample 
[code](https://github.com/apache/lucene/compare/main...kaivalnp:alternate_collection)
   
   selectivity | effective topK | post-filter recall | post-filter time | 
pre-filter recall | pre-filter time
   -- | -- | -- | -- | -- | --
   0.8 | 125 | 0.961 | 1.56 | 0.976 | 1.64
   0.6 | 166 | 0.960 | 1.93 | 0.980 | 1.98
   0.4 | 250 | 0.961 | 2.68 | 0.987 | 2.68
   0.2 | 500 | 0.960 | 4.76 | 0.991 | 4.55
   0.1 | 1000 | 0.961 | 8.50 | 0.995 | 7.84
   0.01 | 10000 | 0.953 | 58.17 | 1.000 | 9.71



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to