irvingzhang opened a new pull request #1340: Jira/lucene-9004: Refactor the 
implementation of HNSW following Faiss
URL: https://github.com/apache/lucene-solr/pull/1340
 
 
   There were some misunderstandings in the implementation of HNSW algorithm, 
making the recall of Lucene HNSW about [5% 
lower](https://github.com/jtibshirani/ann-benchmarks/pull/1) than Faiss. This 
PR is meant to implement the HNSW algorithm in a way similar to Faiss. Here're 
some reasons that why I consider refactoring the implementation,
   1) The major target is to improve the recall percent and develop a correct 
implementation;
   2) Actually , the relationship between neighborhood is not symmetric. Taking 
the following picture for example, assume that max connection = 2, for Node3 
its nearest neighbors are Node1 and Node2, but the nearest neighbors of Node1 
are Node2 and Node4. So I think the connectNodes and shrink operations in 
current implementation is incorrect!
   
![image](https://user-images.githubusercontent.com/8521429/76395621-c04fb880-63b2-11ea-9a35-bcce5ac95e84.png);
   3) When insert a node at layer l, HNSW tries to greedy search the nearest 
one Nnearest node from max level layer to (l + 1) layer. Starting from layer l 
to layer 0, HNSW uses Nnearest rather than all the neighbors of previous layer 
to search efContruction nearest neighbors. In Faiss, Nnearest keeps unchanged 
from layer l to layer 0, while re-selecting the nearest node in each layer in 
nmslib. I followed the implementation of [Faiss 
HNSW](https://github.com/facebookresearch/faiss/blob/master/impl/HNSW.cpp) 
since it's much easier and more efficient.
   4) The neighborhood shrinkage strategy is incorrect. It should follow 
triangle inequality.
   
   It is likely not the best implementation. Further suggestions and 
discussions are welcomed.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to