Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

via GitHub Fri, 06 Oct 2023 07:47:43 -0700


benwtrent commented on PR #12582:
URL: https://github.com/apache/lucene/pull/12582#issuecomment-1750806182


   I did some benchmarks with the current change, using 400k Cohere embeddings 
searching over 1k vectors. This was on GCP `c3-standard-8` (Intel Sapphire 
Rapids), on this machine `byte[]` comparisons run in similar time to `float[]`.
   
   Index buffer: 128MB
   Force merged to one segment.
   
   I noticed that force-merging was about 35% faster.
   
   For query latency:
   
   | query         | recall | latency |
   |---------------|--------|---------|
   | raw           | 0.845  | 0.31ms  |
   | quantized @10 | 0.800  | 0.23ms  |
   | quantized @20 | 0.862  | 0.29    |
   
   Disk usage:
    - raw: 1172MB
    - quantized: 295MB


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

Reply via email to