lpld commented on PR #14078: URL: https://github.com/apache/lucene/pull/14078#issuecomment-2698569925
Hi @benwtrent Thanks again for your previous comment. I was able to modify luceneutil and run some benchmarks. I am quite new to lucene, so I would appreciate some help in understanding the results that I’m getting. First, I was trying to run a quantized and a non-quantized benchmark on the Cohere 768 dataset on my local machine. Here are the results for the quantized benchmark (with Lucene102HnswBinaryQuantizedVectorsFormat): ``` recall latency (ms) nDoc topK fanout maxConn beamWidth quantized index s index docs/s force merge s num segments index size (MB) vec disk (MB) vec RAM (MB) 0.452 11.655 10000000 100 50 16 100 1 bits 1914.86 5222.32 10112.13 1 30934.82 30212.402 915.527 ``` Unfortunately I didn’t save the non-quantized results, but the recall was something around 0.73. Then I ran the same tests on a dedicated server with more CPU and RAM, and the results were weird. Yes, they were much much faster, but the recall was super low now: Non-quantized: ``` recall latency (ms) nDoc topK fanout maxConn beamWidth quantized index s index docs/s force merge s num segments index size (MB) vec disk (MB) vec RAM (MB) 0.203 7.143 10000000 100 50 16 100 no 1403.40 7125.53 769.29 1 29470.29 29296.875 29296.875 ``` Quantized: ``` recall latency (ms) nDoc topK fanout maxConn beamWidth quantized index s index docs/s force merge s num segments index size (MB) vec disk (MB) vec RAM (MB) 0.191 7.721 10000000 100 50 16 100 1 bits 511.40 19554.09 1116.80 1 30597.43 30212.402 915.527 ``` So, my questions are 1. What exactly do the numbers in the description of this pull request mean? When you say that the recall for Cohere 768 is 0.938, is it the absolute recall value that you got from the benchmark, or is it some sort of ratio between the quantized and non-quantized recalls? 2. Do you have any ideas about what could be the reason for such a huge recall difference in the benchmark results on difference environments? 3. I was also trying to do some benchmarking with other public datasets (without luceneutil), and I got a little confused about how to correctly calculate the recall. I understand that recall is a ratio between the number of correct responses and the total number of responses. The total number of responses is straightforward, but the number of correct ones is a bit confusing to me. `luceneutil` is querying them as following (not exact code, but my variation): ```java var queryVector = new ConstKnnByteVectorValueSource(queryEmb); var docVectors = new ByteKnnVectorFieldSource("vector"); var exactQuery = new BooleanQuery.Builder() .add(new FunctionQuery(new ByteVectorSimilarityFunction(similarity, queryVector, docVectors)), BooleanClause.Occur.SHOULD) .add(new MatchAllDocsQuery(), BooleanClause.Occur.FILTER) .build(); ``` However, in `lucene` unit tests a different query is used to get the correct neighbors from the index: ```java var exactQuery = new KnnByteVectorQuery("vector", queryEmb, size, new MatchAllDocsQuery()); ``` I would appreciate if you could give some insights on what query is the correct one, because they return different results. Thanks for your time! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org