mulugetam opened a new issue, #12091:
URL: https://github.com/apache/lucene/issues/12091

   ### Description
   
   Lucene's implementation of ANN relies on a scalar implementation of the 
vector similarity functions 
[dot-product,](https://github.com/apache/lucene/blob/4fe8424925ca404d335fa41d261545d3182c22fa/lucene/core/src/java/org/apache/lucene/index/VectorSimilarityFunction.java#L53)
 [Euclidean 
distance](https://github.com/apache/lucene/blob/4fe8424925ca404d335fa41d261545d3182c22fa/lucene/core/src/java/org/apache/lucene/index/VectorSimilarityFunction.java#L34),
 and 
[cosine](https://github.com/apache/lucene/blob/4fe8424925ca404d335fa41d261545d3182c22fa/lucene/core/src/java/org/apache/lucene/index/VectorSimilarityFunction.java#L71).
 The vector implementation of these functions is quite straightforward. 
   
   Below is performance data I got, based on JMH, comparing the vector 
implementation of the `dot product` and `Euclidean` against the equivalent 
default (scalar with loop-unrolling) implementation. 
   
   `dim` is the dimension/length of the `float[]` arrays in test and `score` is 
the number of dot product/Euclidean distance operations done per second.
   
   ```
   Benchmark           dim      Mode    Cnt     Score                           
Units   Gain
   
----------------------------------------------------------------------------------------------
   scalarDotProduct     60      thrpt   12        32031825.541 ±   6151.580     
ops/s   1.00
   scalarDotProduct     120     thrpt   12        17120537.911 ±   5793.505     
ops/s   1.00
   scalarDotProduct     480     thrpt   12         4506350.215 ±   1677.755     
ops/s   1.00
   vectorDotProduct     60      thrpt   12        98862701.038 ±  85554.695     
ops/s   3.09
   vectorDotProduct     120     thrpt   12        99059913.888 ±  20609.182     
ops/s   5.79
   vectorDotProduct     480     thrpt   12      220320941.436  ± 173467.603     
ops/s   48.89
   ```
   
   ```
   Benchmark           dim      Mode    Cnt     Score                           
Units   Gain
   
----------------------------------------------------------------------------------------------
   scalarSquareDistance         60      thrpt   12          25890614.822 ±  
7071.413     ops/s  1.00
   scalarSquareDistance         120     thrpt   12          12524294.760 ±  
3435.882     ops/s  1.00
   scalarSquareDistance         480     thrpt   12           3145045.026 ±   
409.361     ops/s  1.00
   vectorSquareDistance         60      thrpt   12         104317302.765 ± 
36895.474     ops/s  4.03
   vectorSquareDistance         120     thrpt   12         122083614.889 ± 
11821.642     ops/s  9.75
   vectorSquareDistance         480     thrpt   12         362229408.898 ± 
85439.065     ops/s  115.17
   ```
   
   I have also tested the same with [Msokolov's ANN benchmark 
suite](https://github.com/msokolov/ann-benchmarks) and saw a speedup of more 
than 2x in indexing (docs/sec) and search performance (QPS). Will do a PR for 
it soon.
   
   Let's discuss this :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to