benwtrent commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1788822945
So, I replicated the jvector benchmark (the lucene part) using the new int8 quantization. Note, this is with `0` fan out or extra top-k gathered. Since the benchmark on JVector didn't specify any recall, etc. I just did the absolute baseline of `top-100`. I reserved 12GB to heap, thus reducing off-heap memory to at most 30GB. ``` 1 thread over 37 segments: completed 1000 searches in 18411 ms: 54 QPS CPU time=18231ms checking results 0.777 18.23 100000000 0 16 100 100 0 1.00 post-filter ``` Since kNN allows the segments to be searched in parallel, I used 8 threads for the query rewrite ``` 8 threads over 37 segments: completed 1000 searches in 2996 ms: 333 QPS CPU time=218ms checking results 0.777 0.22 100000000 0 16 100 100 0 1.00 post-filter ``` I am currently force-merging to a single segment to see what a single graph gives us. FYI: the data set would require > 40GB of ram to be held in memory. With int8 quantization, its down to around 10GB. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org