[jira] [Commented] (LUCENE-9937) ann-benchmarks results for HNSW search

Michael Sokolov (Jira) Wed, 11 Aug 2021 15:03:07 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17397664#comment-17397664
 ]


Michael Sokolov commented on LUCENE-9937:
-----------------------------------------

I'm glad you found the batch setup was confounding things! It's hard to A/A 
comparison between this different implementations!

> ann-benchmarks results for HNSW search
> --------------------------------------
>
>                 Key: LUCENE-9937
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9937
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Julie Tibshirani
>            Priority: Minor
>
> *NOTE: These benchmarks contained an error that made hnswlib perform too 
> well. See the last comment for correct results.*
> I hooked up our HNSW implementation to 
> [ann-benchmarks|https://github.com/erikbern/ann-benchmarks], a widely used 
> repo for benchmarking nearest neighbor search libraries against large 
> datasets. I found the results interesting and opened this issue to share and 
> discuss. My benchmarking code can be found 
> [here|https://github.com/jtibshirani/lucene/pull/1] – it's hacky and not easy 
> to commit but I’m happy to help anyone get set up with it.
> Approaches
>  * LuceneVectorsOnly: a baseline that only indexes vectors, and performs a 
> brute force scan to determine nearest neighbors
>  * LuceneHnsw: our HNSW implementation
>  * [hnswlib|https://github.com/nmslib/hnswlib]: a C++ HNSW implementation 
> from the author of the paper
> Datasets
>  * sift-128-euclidean: 1 million SIFT feature vectors, dimension 128, 
> comparing euclidean distance
>  * glove-100-angular: ~1.2 million GloVe word vectors, dimension 100, 
> comparing cosine similarity
> *Results on sift-128-euclidean*
>  Parameters: M=16, efConstruction=500
> {code:java}
> Approach                Recall  QPS
> LuceneVectorsOnly()     1.000      6.764
> LuceneHnsw(n_cands=10)  0.603   7736.968
> LuceneHnsw(n_cands=50)  0.890   3605.000
> LuceneHnsw(n_cands=100) 0.953   2237.429
> LuceneHnsw(n_cands=500) 0.996    570.900
> LuceneHnsw(n_cands=800) 0.998    379.589
> hnswlib(n_cands=10)     0.713  69662.756
> hnswlib(n_cands=50)     0.950  28021.582
> hnswlib(n_cands=100)    0.985  16108.538
> hnswlib(n_cands=500)    1.000   4115.886
> hnswlib(n_cands=800)    1.000   2729.680
> {code}
> *Results on glove-100-angular*
>  Parameters: M=32, efConstruction=500
> {code:java}
> Approach                Recall  QPS
> LuceneVectorsOnly()     1.000      6.722
> LuceneHnsw(n_cands=10)  0.507   5036.236
> LuceneHnsw(n_cands=50)  0.760   2099.850
> LuceneHnsw(n_cands=100) 0.833   1233.690
> LuceneHnsw(n_cands=500) 0.941    309.077
> LuceneHnsw(n_cands=800) 0.961    203.782
> hnswlib(n_cands=10)     0.597  43543.345
> hnswlib(n_cands=50)     0.832  14719.165
> hnswlib(n_cands=100)    0.897   8299.948
> hnswlib(n_cands=500)    0.981   1931.985
> hnswlib(n_cands=800)    0.991    881.752
> {code}
> Notes on benchmark:
>  * By default, the ann-benchmarks repo retrieves 10 nearest neighbors and 
> computes the recall against the true neighbors. The recall calculation has a 
> small 'fudge factor' that allows neighbors that are within a small epsilon of 
> the best distance. Queries are executed serially to obtain the QPS.
>  * I chose parameters where hnswlib performed well, then passed these same 
> parameters to Lucene HNSW. For index-time parameters, I set maxConn as M and 
> beamWidth as efConstruction. For search parameters, I set k to k, and fanout 
> as (num_cands - k) so that the beam search is of size num_cands. Note that 
> our default value for beamWidth is 16, which is really low – I wasn’t able to 
> obtain acceptable recall until I bumped it to closer to 500 to match the 
> hnswlib default.
>  * I force-merged to one segment before running searches since this gives the 
> best recall + QPS, and also to match hnswlib.
> Some observations:
>  * It'd be really nice to extend luceneutil to measure vector search recall 
> in addition to latency. That would help ensure we’re benchmarking a more 
> realistic scenario, instead of accidentally indexing/ searching at a very low 
> recall. Tracking recall would also guard against subtle, unintentional 
> changes to the algorithm. It's easy to make an accidental change while 
> refactoring, and with approximate algorithms, unit tests don't guard as well 
> against this.
>  * Lucene HNSW gives a great speed-up over the baseline without sacrificing 
> too much recall. But it doesn't perform as well as hnswlib in terms of both 
> recall and QPS. We wouldn’t expect the results to line up perfectly, since 
> Lucene doesn't actually implement HNSW – the current algorithm isn't actually 
> hierarchical and only uses a single graph layer. Does this difference might 
> indicate we're leaving performance 'on the table' by not using layers, which 
> (I don't think) adds much index time or space? Or are there other algorithmic 
> improvements would help close the gap?
>  * Setting beamWidth to 500 *really* slowed down indexing. I'll open a 
> separate issue with indexing speed results, keeping this one focused on 
> search.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9937) ann-benchmarks results for HNSW search

Reply via email to