ShashwatShivam commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2356278997
Following up on the above comment by tanyaroosta, the dataset I was using
for benchmarking RaBitQ through Luceneutil (main branch) was amazon's ASIN and
query embeddings (which are 256 dim float vectors). Here are three sets of
results I got:
**Normal HNSW**
```
Results:
recall latency (ms) nDoc topK fanout maxConn beamWidth quantized
index s force merge s num segments index size (MB)
0.721 0.128 1439443 10 6 32 40 no
81.51 42.58 1 1455.25
0.828 0.166 1439443 10 6 64 100 no
202.30 78.03 1 1472.66
0.886 0.202 1439443 10 6 64 250 no
545.83 242.29 1 1493.95
0.914 0.227 1439443 10 6 64 500 no
1111.43 452.62 1 1511.73
```
**RaBitQ HNSW (same parameters) - codec
Lucene912HnswBinaryQuantizedVectorsFormat**
```
Results:
recall latency (ms) nDoc topK fanout maxConn beamWidth quantized
index s force merge s num segments index size (MB)
0.448 0.095 1439443 10 6 32 40 no
86.15 45.45 1 1526.72
0.522 0.130 1439443 10 6 64 100 no
195.14 89.08 1 1555.16
0.542 0.167 1439443 10 6 64 250 no
524.03 207.12 1 1589.50
0.547 0.192 1439443 10 6 64 500 no
1035.65 415.44 1 1618.38
```
**Only RaBitQ - codec Lucene912BinaryQuantizedVectorsFormat**
```
Results:
recall latency (ms) nDoc topK fanout maxConn beamWidth quantized
index s force merge s num segments index size (MB)
0.000 87.658 1439443 10 100 64 40 no
12.39 8.01 1 1471.69
0.000 87.830 1439443 100 100 64 40 no
12.53 7.78 1 1471.69
```
The first issue we're facing is the large recall reduction for RaBitQ HNSW
when compared to pure HNSW. Is this recall regression expected? Secondly, the
pure RaBitQ implementation returns 0 recall, which is definitely suspect.
Perhaps there is a bug on my end when trying to benchmark it using luceneutil?
@benwtrent I'll also try working with your branch of luceneutil to see if it
changes the results
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]