[ https://issues.apache.org/jira/browse/LUCENE-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17397664#comment-17397664 ]
Michael Sokolov commented on LUCENE-9937: ----------------------------------------- I'm glad you found the batch setup was confounding things! It's hard to A/A comparison between this different implementations! > ann-benchmarks results for HNSW search > -------------------------------------- > > Key: LUCENE-9937 > URL: https://issues.apache.org/jira/browse/LUCENE-9937 > Project: Lucene - Core > Issue Type: Task > Reporter: Julie Tibshirani > Priority: Minor > > *NOTE: These benchmarks contained an error that made hnswlib perform too > well. See the last comment for correct results.* > I hooked up our HNSW implementation to > [ann-benchmarks|https://github.com/erikbern/ann-benchmarks], a widely used > repo for benchmarking nearest neighbor search libraries against large > datasets. I found the results interesting and opened this issue to share and > discuss. My benchmarking code can be found > [here|https://github.com/jtibshirani/lucene/pull/1] – it's hacky and not easy > to commit but I’m happy to help anyone get set up with it. > Approaches > * LuceneVectorsOnly: a baseline that only indexes vectors, and performs a > brute force scan to determine nearest neighbors > * LuceneHnsw: our HNSW implementation > * [hnswlib|https://github.com/nmslib/hnswlib]: a C++ HNSW implementation > from the author of the paper > Datasets > * sift-128-euclidean: 1 million SIFT feature vectors, dimension 128, > comparing euclidean distance > * glove-100-angular: ~1.2 million GloVe word vectors, dimension 100, > comparing cosine similarity > *Results on sift-128-euclidean* > Parameters: M=16, efConstruction=500 > {code:java} > Approach Recall QPS > LuceneVectorsOnly() 1.000 6.764 > LuceneHnsw(n_cands=10) 0.603 7736.968 > LuceneHnsw(n_cands=50) 0.890 3605.000 > LuceneHnsw(n_cands=100) 0.953 2237.429 > LuceneHnsw(n_cands=500) 0.996 570.900 > LuceneHnsw(n_cands=800) 0.998 379.589 > hnswlib(n_cands=10) 0.713 69662.756 > hnswlib(n_cands=50) 0.950 28021.582 > hnswlib(n_cands=100) 0.985 16108.538 > hnswlib(n_cands=500) 1.000 4115.886 > hnswlib(n_cands=800) 1.000 2729.680 > {code} > *Results on glove-100-angular* > Parameters: M=32, efConstruction=500 > {code:java} > Approach Recall QPS > LuceneVectorsOnly() 1.000 6.722 > LuceneHnsw(n_cands=10) 0.507 5036.236 > LuceneHnsw(n_cands=50) 0.760 2099.850 > LuceneHnsw(n_cands=100) 0.833 1233.690 > LuceneHnsw(n_cands=500) 0.941 309.077 > LuceneHnsw(n_cands=800) 0.961 203.782 > hnswlib(n_cands=10) 0.597 43543.345 > hnswlib(n_cands=50) 0.832 14719.165 > hnswlib(n_cands=100) 0.897 8299.948 > hnswlib(n_cands=500) 0.981 1931.985 > hnswlib(n_cands=800) 0.991 881.752 > {code} > Notes on benchmark: > * By default, the ann-benchmarks repo retrieves 10 nearest neighbors and > computes the recall against the true neighbors. The recall calculation has a > small 'fudge factor' that allows neighbors that are within a small epsilon of > the best distance. Queries are executed serially to obtain the QPS. > * I chose parameters where hnswlib performed well, then passed these same > parameters to Lucene HNSW. For index-time parameters, I set maxConn as M and > beamWidth as efConstruction. For search parameters, I set k to k, and fanout > as (num_cands - k) so that the beam search is of size num_cands. Note that > our default value for beamWidth is 16, which is really low – I wasn’t able to > obtain acceptable recall until I bumped it to closer to 500 to match the > hnswlib default. > * I force-merged to one segment before running searches since this gives the > best recall + QPS, and also to match hnswlib. > Some observations: > * It'd be really nice to extend luceneutil to measure vector search recall > in addition to latency. That would help ensure we’re benchmarking a more > realistic scenario, instead of accidentally indexing/ searching at a very low > recall. Tracking recall would also guard against subtle, unintentional > changes to the algorithm. It's easy to make an accidental change while > refactoring, and with approximate algorithms, unit tests don't guard as well > against this. > * Lucene HNSW gives a great speed-up over the baseline without sacrificing > too much recall. But it doesn't perform as well as hnswlib in terms of both > recall and QPS. We wouldn’t expect the results to line up perfectly, since > Lucene doesn't actually implement HNSW – the current algorithm isn't actually > hierarchical and only uses a single graph layer. Does this difference might > indicate we're leaving performance 'on the table' by not using layers, which > (I don't think) adds much index time or space? Or are there other algorithmic > improvements would help close the gap? > * Setting beamWidth to 500 *really* slowed down indexing. I'll open a > separate issue with indexing speed results, keeping this one focused on > search. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org