jpountz commented on PR #12789:
URL: https://github.com/apache/lucene/pull/12789#issuecomment-1805727513

   Thanks, the numbers make more sense to me now.
   
   Intuitively, `FixedBitSet` performs better when a large percentage of nodes 
needs to be visited and `SparseFixedBitSet` performs better otherwise. 
Practically, the smaller segments of an index should probably always use a 
`FixedBitSet`. E.g. a simple threshold may consist of using `SparseFixedBitSet` 
when we would expect it to use less memory than `FixedBitSet`, ie. when less 
than 1/64 = 1.5% of the nodes get visited (or possibly a bit less: if both 
`SparseFixedBitSet` and `FixedBitSet` use similar amounts of memory, it 
probably makes sense to bias towards `FixedBitSet`) and `FixedBitSet` 
otherwise. I see that your benchmark visits between 2.0% and 6.9% of the nodes 
on GLOVE and between 0.8% and 5.0% on Cohere, so it makes sense to me that 
`FixedBitSet` performs better.
   
   Is it possible to estimate the order of the number of nodes that a nn search 
needs to visit, so that we could use it as a threshold?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to