benwtrent commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2427907718

   Here is some Lucene Util Benchmarking. Some of these numbers actually 
contradict some of my previous benchmarking for int4. Which is frustrating, I 
wonder what I did wrong then or now. Or of float32 got faster between then and 
now :)
   
   Regardless, this shows that bit quantization is generally as fast as int4 
search or faster and you can get good recall with some oversampling. Combining 
with the 32x reduction in space its pretty nice.
   
   The oversampling rates were `[1, 1.5, 2, 3, 4, 5]`. HNSW params 
`m=16,efsearch=100`. `Recall@100`.
   
   ## Cohere v2 1M
   
   | quantization      | Index Time | Force Merge time | Mem Required |
   |-------------------|------------|------------------|--------------|
   | 1 bit             | 395.18     | 411.67           | 175.9MB      |
   | 4 bit (compress)  | 1877.47    | 491.13           | 439.7MB      |
   | 7 bit             | 500.59     | 820.53           | 833.9MB      |
   | raw               | 493.44     | 792.04           | 3132.8MB     |
   
   
![cohere-v2-bit-1M](https://github.com/user-attachments/assets/0e704dc7-d4a2-4f4a-98ca-3b23641cd4e9)
   
   ## Cohere v3 1M
   
   1M Cohere v3 1024
   
   | quantization      | Index Time | Force Merge time | Mem Required |
   |-------------------|------------|------------------|--------------|
   | 1 bit             | 338.97     | 342.61           | 208MB        |
   | 4 bit (compress)  | 1113.06    | 5490.36          | 578MB        |
   | 7 bit             | 437.63     | 744.12           | 1094MB       |
   | raw               | 408.75     | 798.11           | 4162MB       |
   
   
   
![cohere-v3-bit-1M](https://github.com/user-attachments/assets/94a25ce5-4cbf-4a7a-b07b-052ff730c9ca)
   
   # e5Small
   
   | quantization      | Index Time | Force Merge time | Mem Required |
   |-------------------|------------|------------------|--------------|
   | 1 bit             | 161.84     | 42.37            | 57.6MB       |
   | 4 bit (compress)  | 665.54     | 660.33           | 123.2MB      |
   | 7 bit             | 267.13     | 89.99            | 219.6MB      |
   | raw               | 249.26     | 77.81            | 793.5MB      |
   
   
![e5small-bit-500k](https://github.com/user-attachments/assets/d649a54b-9da6-454f-9d9b-f7ff2b53ac78)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to