kaivalnp commented on PR #15784:
URL: https://github.com/apache/lucene/pull/15784#issuecomment-3981253652

   Corresponding `luceneutil` PR used for benchmarking: 
https://github.com/mikemccand/luceneutil/pull/542
   
   Cohere v3 vectors, 1024d, `DOT_PRODUCT`, 400K docs, 10K queries, no 
quantization, force merge enabled:
   
   Baseline
   
   ```
   recall  latency(ms)  netCPU  avgCpuCount  traversalSimilarity  
resultSimilarity  visited
    0.983        4.314   4.218        0.978                 0.74               
0.8     7987
    0.979        2.209   2.159        0.977                 0.76               
0.8     2614
    0.969        1.470   1.434        0.976                 0.78               
0.8      958
    0.942        1.210   1.180        0.975                  0.8               
0.8      513
   ```
   
   Candidate
   
   ```
   recall  latency(ms)  netCPU  avgCpuCount  resultSimilarity  visited
    0.983        1.415   1.407        0.995               0.8      757
   ```
   
   The query is now simpler (no need for `traversalSimilarity`), has better 
recall for the same latency (a low value of `traversalSimilarity` was needed 
for queries in sparse spaces -- leading to high latency for queries in dense 
spaces, plus search was getting stuck in local maxima earlier -- leading to low 
recall).
   
   API-wise, all classes are marked `@lucene.experimental` so we can change the 
algorithm / remove params. If a user really wants to, they can still use the 
old behavior (or a more sophisticated algorithm) by overriding [this 
function](https://github.com/apache/lucene/blob/f021aa55853c8b446404c8616ec247027774ae07/lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java#L64-L67)
 with a custom collector.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to