benwtrent commented on PR #12820: URL: https://github.com/apache/lucene/pull/12820#issuecomment-1957923215
I have done some more benchmarking and there isn't really a significant improvement. This is over 500k, 1024 vectors. Getting the nearest 500 neighbors. Baseline ``` latency nDoc filter 3.11 500000 0.0060 pre-filter 3.11 500000 0.0059 pre-filter 2.94 500000 0.0058 pre-filter 2.90 500000 0.0057 pre-filter 2.81 500000 0.0056 pre-filter 2.77 500000 0.0055 pre-filter 2.65 500000 0.0054 pre-filter 2.80 500000 0.0053 pre-filter ``` Candidate (this using a FixedBitSet to keep track of visited in a collector) ``` latency nDoc filter 2.94 500000 0.0060 pre-filter 2.90 500000 0.0059 pre-filter 2.87 500000 0.0058 pre-filter 2.94 500000 0.0057 pre-filter 2.70 500000 0.0056 pre-filter 2.60 500000 0.0055 pre-filter 2.63 500000 0.0054 pre-filter 2.59 500000 0.0053 pre-filter ``` Note, this is with a FixedBitSet that allocates enough space to track every vector. This can get very expensive on large segments. I tried a `SparseBitSet` but at my scale of only 500k docs, it was actually slower than baseline. It just shows that the margins of the gain here may be very slim :/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org