Re: [PR] gh-12627: HnswGraphBuilder connects disconnected HNSW graph components [lucene]

via GitHub Fri, 12 Jul 2024 06:35:26 -0700


msokolov commented on PR #13566:
URL: https://github.com/apache/lucene/pull/13566#issuecomment-2225606218


   I tested using KnnGraphTester and in the process changed this to handle 
multiple levels (it just seemed to make sense form a consistency perspective). 
I also found there were a couple of bugs:
   * the `M`/max-connections check was off-by-one
   * added a call to `finish()` so it would actually do something!
   
   Testing w/1M 256-dim documents I got results that are within noise for 
recall, latency, and index time:
   
   |test|recall|latency|index time|
   |-----|-----|-----|-----|
   | baseline| 0.927 | 2.32 | 164755 |
   | baseline| 0.926 | 2.26 | 167982 |
   | candidate| 0.924 | 2.16 | 159109 |
   | candidate| 0.927 | 2.25 | 169839 |
   
   I printed out the graph connectivity and observed that there were many 
singleton nodes produced (nodes not connected to any other node) in the 
baseline case and these were largely eliminated in the candidate, although I 
did still see some on level 1 on one of my tests in some segments.
   
   I also printed out the time taken to do the connection - it was about 3s in 
a 3m indexing run. Overall I think this is worth doing in order to improve 
sanity.
   
   It would still be better if we could *guarantee* that the graphs are fully 
connected. It is challenging though since we assume nodes will also comply with 
max-connection limit and in theory every node in a component could be maximally 
connected which would not allow for any further connection, so I think I will 
put this off for some future and opt for P not P. 
   
   Still need to address the concurrent graph builder.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] gh-12627: HnswGraphBuilder connects disconnected HNSW graph components [lucene]

Reply via email to