ShashwatShivam commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2470937060

   I conducted a benchmark using Cohere's 768-dimensional data. Here are the 
steps I followed for reproducibility:
   
   1. **Set up** the [luceneutil 
repository](https://github.com/mikemccand/luceneutil/) following the 
installation instructions provided.
   2. **Switch branches** to [this specific 
branch](https://github.com/mikemccand/luceneutil/compare/main...benwtrent:luceneutil:bbq)
 since the latest mainline branch is not compatible with the feature needed for 
this experiment.
   3. **Change the branch** of `lucene_candidate` to 
[benwtrent:feature/adv-binarization-format](https://github.com/benwtrent/lucene/tree/feature/adv-binarization-format)
 to incorporate advanced binarization formats.
   4. **Run** `knnPerfTest.py` after specifying the document and query file 
paths to the stored Cohere data files. The runtime parameters were set as 
follows:
      - `nDoc = 500,000`
      - `topk = 10`
      - `fanout = 100`
      - `maxConn = 32`
      - `beamWidth = 100`
      - `oversample` values tested: `{1, 1.5, 2, 3, 4, 5}`
      
      I used `quantizeBits = 1` for RaBitQ+HNSW and `quantizeBits = 32` for 
regular HNSW.
   
   A comparison was performed between HNSW and RaBitQ, and I observed the 
recall-latency tradeoff, which is shown in the attached image:  
   
![output](https://github.com/user-attachments/assets/8f5f8795-8386-422a-8a9a-d7fd9e7051d2).
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to