nitirajrathore commented on issue #12627:
URL: https://github.com/apache/lucene/issues/12627#issuecomment-1772314503

   Thanks @msokolov : These are really good suggestions. I will try to 
incorporate these ideas in solutions. I think in the end there can be multiple 
ways to allow more connectivity.
   
   I was also worried about how the connections are made and pruned. I checked 
[these 
lines](https://github.com/nitirajrathore/lucene/blob/f64bb19697708bfd91e05ff4314976c991f60cbc/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java#L293-L293)
 in our algorithm. Here we connect a node to the neighbour list only if is not 
closer to any of the current neighbour. Think of a situaion where for some 
nodes (say `a`) if the first neighbour connected (say `b`) has a good score but 
the rest of the candidate nodes have higher score with `b` than `a`. Then the 
rest of the candidates will NOT get connected to `a` AT ALL. Later if because 
of some other connection of `b`, say the non diverse connection of b -> a is 
removed then the `a` will get completely disconnected from graph. Basically we 
should have more nodes connected in the begining itself so that if some nodes 
are removed we do not run danger of disconnected graphs.
   
   To check this I added lot of code to collect the events (like `add`, 
`connect`, `disconnect`) and was able to reproduce it with just 10K graph size 
itself with max-conn 16. Example below. But the same is not reproduced with 
max-conn 32 or max-conn 64. Although, increasing max-conn in this case helped 
but the basic algorithm for connections does not seem very good to me.
   
   To fix this I checked that the original paper proposes the concept of 
[keepPrunedConnections](https://arxiv.org/pdf/1603.09320.pdf). I implemented 
that in my local and ran tests. But something is not right, I will keep 
checking if I did something wrong in the implementation etc. Will submit 
"draft" PR soon for others comments.
   
   I think there is another issue in current implementation. Look at [this 
code](https://github.com/nitirajrathore/lucene/blob/f64bb19697708bfd91e05ff4314976c991f60cbc/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java#L278-L278)
 where we finally connected edge from `b ---> a` and now if `b` has more 
connections then max-conn then we are removing one of the non diverse 
connection of `b` Say we remove the `b ---> c`. So we are removing this half of 
the undirected connection but I don't think we are removing the other half `c 
---> b` anywhere. This will leave inconsistent Graph. I tried fixing this as 
well, but I am getting some errors. I will try to fix and submit the draft PR.
   
   Note that adding the concept of `keepPrunedConnections` will not just impact 
these disconnected nodes but in general the connections of all the nodes will 
increase and will become close to 'max-conn'. I think this is a good property 
without much additional cost at the time of graph creation. But the search time 
may increase a bit since there will be more connections to explore for the same 
max-conn. But I think this will also increase the Recall with ExactKNN. I will 
run the performance tests once code complete to find those numbers.
   
   ---
   
   - Real example of disconnectedness
   In one of the 10K vector runs, I found 3 nodes were disconnected. namely 
297, 298, 5601
   I checked that 297 was initially connected to 293. After that it was 
compared to a lot of candidates but was not connected to any because their 
score with 293 was higher than their score with 297. For example here 297 was 
compared with 43 and 197 but was NOT connected to either because their score 
with 293 was higher. In this case 297 was left with just one connection (with 
293). Later on the connection of 293 ---> 297 also got deleted because of 
diversity checked triggered for other nodes connection with 293. 
   ```
   level=0,node=297,type=CONNECT,source=297,dest=293,otherdata=score=0.9578
   level=0,node=297,type=COMPARE,source=297,dest=43,otherdata=score=0.9541
   
level=0,node=297,type=COMPARE,source=43,dest=293,otherdata=neighborSimilarity=0.9816,
  score=0.9541
   level=0,node=297,type=COMPARE,source=297,dest=197,otherdata=score=0.9539
   
level=0,node=297,type=COMPARE,source=197,dest=293,otherdata=neighborSimilarity=0.9868,
  score=0.9539
   ......
   level=0,node=298,type=CONNECT,source=298,dest=297,otherdata=score=0.9854
   ....
   ```
   Final connections were, full 2-way connection 297 <---> 298 and one sided 
connection of 297---> 293 (left over connection).
   But there was no path to 297 and 298 from any other node as the other way 
connection from 293 ---> 297 got deleted.
   
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to