msokolov commented on PR #13566: URL: https://github.com/apache/lucene/pull/13566#issuecomment-2225606218
I tested using KnnGraphTester and in the process changed this to handle multiple levels (it just seemed to make sense form a consistency perspective). I also found there were a couple of bugs: * the `M`/max-connections check was off-by-one * added a call to `finish()` so it would actually do something! Testing w/1M 256-dim documents I got results that are within noise for recall, latency, and index time: |test|recall|latency|index time| |-----|-----|-----|-----| | baseline| 0.927 | 2.32 | 164755 | | baseline| 0.926 | 2.26 | 167982 | | candidate| 0.924 | 2.16 | 159109 | | candidate| 0.927 | 2.25 | 169839 | I printed out the graph connectivity and observed that there were many singleton nodes produced (nodes not connected to any other node) in the baseline case and these were largely eliminated in the candidate, although I did still see some on level 1 on one of my tests in some segments. I also printed out the time taken to do the connection - it was about 3s in a 3m indexing run. Overall I think this is worth doing in order to improve sanity. It would still be better if we could *guarantee* that the graphs are fully connected. It is challenging though since we assume nodes will also comply with max-connection limit and in theory every node in a component could be maximally connected which would not allow for any further connection, so I think I will put this off for some future and opt for P not P. Still need to address the concurrent graph builder. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org