zhaih commented on PR #15208:
URL: https://github.com/apache/lucene/pull/15208#issuecomment-3572538179

   @msokolov 
   > Also "I modified the algorithm a bit so that it does not depend on 
previous node being added to the graph when adding the rest of nodes that are 
outside of join set." makes me wonder if this might be part of the problem with 
recall? I'm not sure exactly what you meant (haven't reviewed the code), but it 
sounds as if we don't wait for all of the "covering set" to be indexed before 
beginning to add the remainder of the nodes?
   
   Let me explain a bit more :)
   So we do ensure the "covering set" to be fully indexed before the other part 
to be indexed, the one I mentioned is in [this 
line](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/hnsw/MergingHnswGraphBuilder.java#L168)
   which says
   ```
           // if u's neighbour v is in the join set, or already added to gL (v 
< u),
           // then we add v's neighbours from gL to the candidate list
           if (v < u || j.contains(v)) {
   ```
   I removed the `if (v < u)` part so that we don't really add anything that 
are "supposed" to be added previously (because that won't be the case in 
concurrent setting)
   
   This are actually making the recall better, because the old benchmark 
(candidate) result is a more strict replication of the join set algorithm, 
which is showing even worse recall..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to