ShashwatShivam commented on PR #13651: URL: https://github.com/apache/lucene/pull/13651#issuecomment-2356278997
Following up on the above comment by tanyaroosta, the dataset I was using for benchmarking RaBitQ through Luceneutil (main branch) was amazon's ASIN and query embeddings (which are 256 dim float vectors). Here are three sets of results I got: **Normal HNSW** ``` Results: recall latency (ms) nDoc topK fanout maxConn beamWidth quantized index s force merge s num segments index size (MB) 0.721 0.128 1439443 10 6 32 40 no 81.51 42.58 1 1455.25 0.828 0.166 1439443 10 6 64 100 no 202.30 78.03 1 1472.66 0.886 0.202 1439443 10 6 64 250 no 545.83 242.29 1 1493.95 0.914 0.227 1439443 10 6 64 500 no 1111.43 452.62 1 1511.73 ``` **RaBitQ HNSW (same parameters) - codec Lucene912HnswBinaryQuantizedVectorsFormat** ``` Results: recall latency (ms) nDoc topK fanout maxConn beamWidth quantized index s force merge s num segments index size (MB) 0.448 0.095 1439443 10 6 32 40 no 86.15 45.45 1 1526.72 0.522 0.130 1439443 10 6 64 100 no 195.14 89.08 1 1555.16 0.542 0.167 1439443 10 6 64 250 no 524.03 207.12 1 1589.50 0.547 0.192 1439443 10 6 64 500 no 1035.65 415.44 1 1618.38 ``` **Only RaBitQ - codec Lucene912BinaryQuantizedVectorsFormat** ``` Results: recall latency (ms) nDoc topK fanout maxConn beamWidth quantized index s force merge s num segments index size (MB) 0.000 87.658 1439443 10 100 64 40 no 12.39 8.01 1 1471.69 0.000 87.830 1439443 100 100 64 40 no 12.53 7.78 1 1471.69 ``` The first issue we're facing is the large recall reduction for RaBitQ HNSW when compared to pure HNSW. Is this recall regression expected? Secondly, the pure RaBitQ implementation returns 0 recall, which is definitely suspect. Perhaps there is a bug on my end when trying to benchmark it using luceneutil? @benwtrent I'll also try working with your branch of luceneutil to see if it changes the results -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org