nitirajrathore commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1772314503
Thanks @msokolov : These are really good suggestions. I will try to incorporate these ideas in solutions. I think in the end there can be multiple ways to allow more connectivity. I was also worried about how the connections are made and pruned. I checked [these lines](https://github.com/nitirajrathore/lucene/blob/f64bb19697708bfd91e05ff4314976c991f60cbc/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java#L293-L293) in our algorithm. Here we connect a node to the neighbour list only if is not closer to any of the current neighbour. Think of a situaion where for some nodes (say `a`) if the first neighbour connected (say `b`) has a good score but the rest of the candidate nodes have higher score with `b` than `a`. Then the rest of the candidates will NOT get connected to `a` AT ALL. Later if because of some other connection of `b`, say the non diverse connection of b -> a is removed then the `a` will get completely disconnected from graph. Basically we should have more nodes connected in the begining itself so that if some nodes are removed we do not run danger of disconnected graphs. To check this I added lot of code to collect the events (like `add`, `connect`, `disconnect`) and was able to reproduce it with just 10K graph size itself with max-conn 16. Example below. But the same is not reproduced with max-conn 32 or max-conn 64. Although, increasing max-conn in this case helped but the basic algorithm for connections does not seem very good to me. To fix this I checked that the original paper proposes the concept of [keepPrunedConnections](https://arxiv.org/pdf/1603.09320.pdf). I implemented that in my local and ran tests. But something is not right, I will keep checking if I did something wrong in the implementation etc. Will submit "draft" PR soon for others comments. I think there is another issue in current implementation. Look at [this code](https://github.com/nitirajrathore/lucene/blob/f64bb19697708bfd91e05ff4314976c991f60cbc/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java#L278-L278) where we finally connected edge from `b ---> a` and now if `b` has more connections then max-conn then we are removing one of the non diverse connection of `b` Say we remove the `b ---> c`. So we are removing this half of the undirected connection but I don't think we are removing the other half `c ---> b` anywhere. This will leave inconsistent Graph. I tried fixing this as well, but I am getting some errors. I will try to fix and submit the draft PR. Note that adding the concept of `keepPrunedConnections` will not just impact these disconnected nodes but in general the connections of all the nodes will increase and will become close to 'max-conn'. I think this is a good property without much additional cost at the time of graph creation. But the search time may increase a bit since there will be more connections to explore for the same max-conn. But I think this will also increase the Recall with ExactKNN. I will run the performance tests once code complete to find those numbers. --- - Real example of disconnectedness In one of the 10K vector runs, I found 3 nodes were disconnected. namely 297, 298, 5601 I checked that 297 was initially connected to 293. After that it was compared to a lot of candidates but was not connected to any because their score with 293 was higher than their score with 297. For example here 297 was compared with 43 and 197 but was NOT connected to either because their score with 293 was higher. In this case 297 was left with just one connection (with 293). Later on the connection of 293 ---> 297 also got deleted because of diversity checked triggered for other nodes connection with 293. ``` level=0,node=297,type=CONNECT,source=297,dest=293,otherdata=score=0.9578 level=0,node=297,type=COMPARE,source=297,dest=43,otherdata=score=0.9541 level=0,node=297,type=COMPARE,source=43,dest=293,otherdata=neighborSimilarity=0.9816, score=0.9541 level=0,node=297,type=COMPARE,source=297,dest=197,otherdata=score=0.9539 level=0,node=297,type=COMPARE,source=197,dest=293,otherdata=neighborSimilarity=0.9868, score=0.9539 ...... level=0,node=298,type=CONNECT,source=298,dest=297,otherdata=score=0.9854 .... ``` Final connections were, full 2-way connection 297 <---> 298 and one sided connection of 297---> 293 (left over connection). But there was no path to 297 and 298 from any other node as the other way connection from 293 ---> 297 got deleted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org