Re: [PR] New int4 scalar quantization [lucene]

via GitHub Tue, 26 Mar 2024 12:23:38 -0700


benwtrent commented on PR #13197:
URL: https://github.com/apache/lucene/pull/13197#issuecomment-2021292348


   I did a bunch of local benchmarking on this. I am adding a parameter to 
allow optional compression as the numbers without compressing are compelling 
enough on ARM to justify it IMO. 
   
   To achieve similar recall, `int4` without compression is about 30% faster. 
With compression its about 30% slower, but with 50% of the memory requirements.
   
   Here are some latency vs. recall for int7, and int4 with this change.
   
   ```
   plt.plot([1.49, 1.53, 1.54, 1.83, 2.09], [0.952, 0.962, 0.965, 0.974, 
0.981], marker='o', label='int7')
   plt.plot([1.72, 1.75, 1.79, 2.04, 2.48], [0.897, 0.915, 0.929, 0.971, 0.980 
], marker='o', label='int4_compressed')
   plt.plot([1.08, 1.12, 1.12, 1.34, 1.50], [0.897, 0.915, 0.929, 0.971, 0.980 
], marker='o', label='int4')
   ```
   
![image](https://github.com/apache/lucene/assets/4357155/f825ee55-ca44-4af2-8057-d59ab6a34fd3)
   
   int4 with compression gives 2x space improvement over int7, but it comes at 
an obvious cost as we have to (un)pack bytes during dot-products.
   
   Here are the numbers around index building as well. I committed ever 1MB to 
ensure merging occurred and that force-merging was adequately exercised. 
   
   Int4 no compression:
   ```
   Indexed 500000 documents in 312090ms
   Force merge done in: 76169 ms
   ```
   
   Int4 compression:
   ```
   Indexed 500000 documents in 326978ms
   Force merge done in: 124961 ms
   ```
   
   Int7:
   ```
   Indexed 500000 documents in 344584ms
   Force merge done in: 98311 ms
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] New int4 scalar quantization [lucene]

Reply via email to