[
https://issues.apache.org/jira/browse/LUCENE-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julie Tibshirani updated LUCENE-9937:
-
Description:
I hooked up our HNSW implementation to
[ann-benchmarks|https://github.com/erikbern/ann-benchmarks], a widely used repo
for benchmarking nearest neighbor search libraries against large datasets. I
found the results interesting and opened this issue to share and discuss. My
benchmarking code can be found
[here|https://github.com/jtibshirani/lucene/pull/1] – it's hacky and not easy
to commit but I’m happy to help anyone get set up with it.
Approaches
* LuceneVectorsOnly: a baseline that only indexes vectors, and performs a
brute force scan to determine nearest neighbors
* LuceneHnsw: our HNSW implementation
* [hnswlib|https://github.com/nmslib/hnswlib]: a C++ HNSW implementation from
the author of the paper
Datasets
* sift-128-euclidean: 1 million SIFT feature vectors, dimension 128, comparing
euclidean distance
* glove-100-angular: ~1.2 million GloVe word vectors, dimension 100, comparing
cosine similarity
*Results on sift-128-euclidean*
Parameters: M=16, efConstruction=500
{code:java}
ApproachRecall QPS
LuceneVectorsOnly() 1.000 6.764
LuceneHnsw(n_cands=10) 0.603 7736.968
LuceneHnsw(n_cands=50) 0.890 3605.000
LuceneHnsw(n_cands=100) 0.953 2237.429
LuceneHnsw(n_cands=500) 0.996570.900
LuceneHnsw(n_cands=800) 0.998379.589
hnswlib(n_cands=10) 0.713 69662.756
hnswlib(n_cands=50) 0.985 16108.538
hnswlib(n_cands=100)0.950 28021.582
hnswlib(n_cands=500)1.000 4115.886
hnswlib(n_cands=800)1.000 2729.680
{code}
*Results on glove-100-angular*
Parameters: M=32, efConstruction=500
{code:java}
ApproachRecall QPS
LuceneVectorsOnly() 1.000 6.764
LuceneHnsw(n_cands=10) 0.507 5036.236
LuceneHnsw(n_cands=50) 0.760 2099.850
LuceneHnsw(n_cands=100) 0.833 1233.690
LuceneHnsw(n_cands=500) 0.941309.077
LuceneHnsw(n_cands=800) 0.961203.782
hnswlib(n_cands=10) 0.597 43543.345
hnswlib(n_cands=50) 0.832 14719.165
hnswlib(n_cands=100)0.897 8299.948
hnswlib(n_cands=500)0.981 1931.985
hnswlib(n_cands=800)0.991881.752
{code}
Notes on benchmark:
* By default, the ann-benchmarks repo retrieves 10 nearest neighbors and
computes the recall against the true neighbors. The recall calculation has a
small 'fudge factor' that allows neighbors that are within a small epsilon of
the best distance. Queries are executed serially to obtain the QPS.
* I chose parameters where hnswlib performed well, then passed these same
parameters to Lucene HNSW. For index-time parameters, I set maxConn as M and
beamWidth as efConstruction. For search parameters, I set k to k, and fanout as
(num_cands - k) so that the beam search is of size num_cands. Note that our
default value for beamWidth is 16, which is really low – I wasn’t able to
obtain acceptable recall until I bumped it to closer to 500 to match the
hnswlib default.
* I force-merged to one segment before running searches since this gives the
best recall + QPS, and also to match hnswlib.
Some observations:
* It'd be really nice to extend luceneutil to measure vector search recall in
addition to latency. That would help ensure we’re benchmarking a more realistic
scenario, instead of accidentally indexing/ searching at a very low recall.
Tracking recall would also guard against subtle, unintentional changes to the
algorithm. It's easy to make an accidental change while refactoring, and with
approximate algorithms, unit tests don't guard as well against this.
* Lucene HNSW gives a great speed-up over the baseline without sacrificing too
much recall. But it doesn't perform as well as hnswlib in terms of both recall
and QPS. We wouldn’t expect the results to line up perfectly, since Lucene
doesn't actually implement HNSW – the current algorithm isn't actually
hierarchical and only uses a single graph layer. Does this difference might
indicate we're leaving performance 'on the table' by not using layers, which (I
don't think) adds much index time or space? Or are there other algorithmic
improvements would help close the gap?
* Setting beamWidth to 500 *really* slowed down indexing. I'll open a separate
issue with indexing speed results, keeping this one focused on search.
was:
I hooked up our HNSW implementation to ann-benchmarks, a widely used repo for
benchmarking nearest neighbor search libraries against large datasets. I found
the results interesting and opened this issue to share and discuss. My
benchmarking code can be found
[here|https://github.com/jtibshirani/lucene/pull/1] – it's hacky and not easy
to commit but I’m happy to help anyone get set up with it.
Approaches
* LuceneVectorsOnly: a baseline that only indexes vectors, and performs a
brute force scan to determine nearest neighbo