benwtrent closed pull request #12789: Improve vector search speed by using
FixedBitSet
URL: https://github.com/apache/lucene/pull/12789
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific c
jpountz commented on PR #12789:
URL: https://github.com/apache/lucene/pull/12789#issuecomment-1813030726
++ This feels similar to `IndexOrDocValuesQuery`: we probably can't guess
the absolute best threshold, but we can probably figure out something that is
right more often than wrong. Hopef
benwtrent commented on PR #12789:
URL: https://github.com/apache/lucene/pull/12789#issuecomment-1805943735
@jpountz searching scales logarithmically, but we do have to explore more if
there are any pre-filtered nodes.
We can run some experiments to determine the appropriate threshold.
jpountz commented on PR #12789:
URL: https://github.com/apache/lucene/pull/12789#issuecomment-1805727513
Thanks, the numbers make more sense to me now.
Intuitively, `FixedBitSet` performs better when a large percentage of nodes
needs to be visited and `SparseFixedBitSet` performs bett
benwtrent commented on PR #12789:
URL: https://github.com/apache/lucene/pull/12789#issuecomment-1804203048
@jpountz I re-ran my tests and double checked my numbers, I have some
corrections, I accidentally double-counted sparse sizes, so previous numbers
are 2x too big.
GLOVE-100-100_
jpountz commented on PR #12789:
URL: https://github.com/apache/lucene/pull/12789#issuecomment-1804146598
I can believe that FixedBitSet is faster in some cases, but it's surprising
to me that the memory usage of SparseFixedBitSet can go up to 2x that of
FixedBitSet, this makes me wonder if