benwtrent commented on PR #12820:
URL: https://github.com/apache/lucene/pull/12820#issuecomment-1957923215

   I have done some more benchmarking and there isn't really a significant 
improvement. This is over 500k, 1024 vectors. Getting the nearest 500 
neighbors. 
   
   Baseline
   ```
   latency      nDoc    filter
   3.11 500000  0.0060  pre-filter
   3.11 500000  0.0059  pre-filter
   2.94 500000  0.0058  pre-filter
   2.90 500000  0.0057  pre-filter
   2.81 500000  0.0056  pre-filter
   2.77 500000  0.0055  pre-filter
   2.65 500000  0.0054  pre-filter
   2.80 500000  0.0053  pre-filter
   ```
   
   Candidate (this using a FixedBitSet to keep track of visited in a collector)
   ```
   latency      nDoc    filter
   2.94 500000  0.0060  pre-filter
   2.90 500000  0.0059  pre-filter
   2.87 500000  0.0058  pre-filter
   2.94 500000  0.0057  pre-filter
   2.70 500000  0.0056  pre-filter
   2.60 500000  0.0055  pre-filter
   2.63 500000  0.0054  pre-filter
   2.59 500000  0.0053  pre-filter
   ```
   
   Note, this is with a FixedBitSet that allocates enough space to track every 
vector. This can get very expensive on large segments. I tried a `SparseBitSet` 
but at my scale of only 500k docs, it was actually slower than baseline. 
   
   It just shows that the margins of the gain here may be very slim :/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to