kaivalnp commented on code in PR #932:
URL: https://github.com/apache/lucene/pull/932#discussion_r888849991


##########
lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java:
##########
@@ -225,6 +225,11 @@ public BitSetIterator getIterator(int contextOrd) {
       return new BitSetIterator(bitSets[contextOrd], cost[contextOrd]);
     }
 
+    public void setBitSet(BitSet bitSet, int cost) {
+      bitSets[ord] = bitSet;

Review Comment:
   Here are some related numbers. This is the baseline (with bulk collection):
   
   selectivity | effective topK | post-filter recall | post-filter time | 
pre-filter recall | pre-filter time
   -- | -- | -- | -- | -- | --
   0.8 | 125 | 0.964 | 1.58 | 0.975 | 1.60
   0.6 | 166 | 0.962 | 1.94 | 0.981 | 1.97
   0.4 | 250 | 0.960 | 2.70 | 0.986 | 2.64
   0.2 | 500 | 0.963 | 4.76 | 0.991 | 4.51
   0.1 | 1000 | 0.957 | 8.53 | 0.995 | 7.78
   0.01 | 10000 | 0.961 | 58.28 | 1.000 | 9.58
   
   I removed the overloaded `BulkScorer` (and made the `#scorer` return a 
`ConstantScoreScorer` wrapping the `BitSetIterator` of our query, much like the 
`BitSetQuery` that you mentioned). This would remove the bulk collection 
optimization (and switch to doc by doc collection). Here are the numbers:
   
   selectivity | effective topK | post-filter recall | post-filter time | 
pre-filter recall | pre-filter time
   -- | -- | -- | -- | -- | --
   0.8 | 125 | 0.967 | 1.55 | 0.976 | 19.65
   0.6 | 166 | 0.964 | 1.94 | 0.981 | 17.79
   0.4 | 250 | 0.961 | 2.69 | 0.986 | 14.71
   0.2 | 500 | 0.958 | 4.78 | 0.992 | 11.19
   0.1 | 1000 | 0.959 | 8.53 | 0.994 | 11.50
   0.01 | 10000 | 0.937 | 58.32 | 1.000 | 10.34
   
   The prefilter collection time seems to be high when more docs pass (and are 
collected one-by-one)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to