irvingzhang edited a comment on issue #1295: Lucene-9004: bug fix for searching 
the nearest one neighbor in higher layers
URL: https://github.com/apache/lucene-solr/pull/1295#issuecomment-592287771
 
 
   > I believe in practice that results. max size is always set to ef, so there 
shouldn't be any real issue. I agree that the interface doesn't make that 
plain; we should enforce this invariant by API contract
   
   Hi, @msokolov, I agree that the max size is always set to _ef_, but _ef_ has 
different values in different layers. 
   According to **Algorithm 5** of Yury's 
[papar](https://arxiv.org/pdf/1603.09320.pdf),  HNSW searches the nearest one 
neighbor (namely, _ef_=1) from the top layer to the 1st layer,  and then finds 
the nearest _ef_ (_ef_=topK) neighbors from layer 0. In the implementation of 
Lucene HNSW, the actual size of result queue (Line 64, 
[HNSWGraphReader](https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/util/hnsw/HNSWGraphReader.java))
 is set to _ef_=topK when searching from top layer to the 1st layer while 
expected neighbor size is 1, result in finding more neighbors than expected. 
Even if the parameter _ef_ is set to 1 in Line 66, 
[HNSWGraphReader](https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/util/hnsw/HNSWGraphReader.java),
 the condition `if (dist < f.distance() || results.size() < ef)` (Line 87, 
[HNSWGraph](https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/util/hnsw/HNSWGraph.java))
 allows inserting more than 1 neighbor to the "results" queue when `dist < 
f.distance()` and `results.size() >= ef` (here _ef_=1, corresponding to  Line 
66, 
[HNSWGraphReader](https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/util/hnsw/HNSWGraphReader.java))
 because the max size of "results" is topK, which implies that the actual size 
of "results" queue belongs to [1, topK].
   
   **The simplest way to verify this problem is to print the actual size of 
neighbors**. For example, add "System.out.println(neighbors.size());" after 
"visitedCount += hnsw.searchLayer(query, neighbors, 1, l, vectorValues);" (Line 
66, 
[HNSWGraphReader](https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/util/hnsw/HNSWGraphReader.java)),
 where the nearest one neighbor is expected, but the printed neighbor size 
would be range from 1~topK. Which also applies to 
[HNSWGraphWriter](https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/util/hnsw/HNSWGraphWriter.java).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to