benwtrent commented on issue #12615:
URL: https://github.com/apache/lucene/issues/12615#issuecomment-1788822945

   So, I replicated the jvector benchmark (the lucene part) using the new int8 
quantization. 
   
   Note, this is with `0` fan out or extra top-k gathered. Since the benchmark 
on JVector didn't specify any recall, etc. I just did the absolute baseline of 
`top-100`.
   
   I reserved 12GB to heap, thus reducing off-heap memory to at most 30GB.
   
   ```
   1 thread over 37 segments:
   completed 1000 searches in 18411 ms: 54 QPS CPU time=18231ms
   checking results
   0.777        18.23   100000000       0       16      100     100     0       
1.00    post-filter
   ```
   
   Since kNN allows the segments to be searched in parallel, I used 8 threads 
for the query rewrite
   ```
   8 threads over 37 segments:
   completed 1000 searches in 2996 ms: 333 QPS CPU time=218ms
   checking results
   0.777         0.22   100000000       0       16      100     100     0       
1.00    post-filter
   ```
   
   I am currently force-merging to a single segment to see what a single graph 
gives us. 
   
   FYI: the data set would require > 40GB of ram to be held in memory. With 
int8 quantization, its down to around 10GB.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to