alessandrobenedetti opened a new pull request, #926:
URL: https://github.com/apache/lucene/pull/926

   (https://issues.apache.org/jira/browse/LUCENE-10593)
   
   org.apache.lucene.index.VectorSimilarityFunction#EUCLIDEAN similarity 
behaves in an opposite way in comparison to the other similarities:
   A higher similarity score means higher distance, for this reason, has been 
marked with "reversed" and a function is present to map from the similarity to 
a score (where higher means closer, like in all other similarities.)
   Having this counterintuitive behavior with no apparent explanation I could 
find(please correct me if I am wrong) brings a lot of nasty side effects for 
the code readability, especially when combined with the NeighbourQueue that has 
a "reversed" itself.
   In addition, it complicates also the usage of the pattern:
   Result Queue -> MIN HEAP
   Candidate Queue -> MAX HEAP
   In HNSW searchers.
   The proposal in my Pull Request aims to:
   1) the Euclidean similarity just returns the score, in line with the other 
similarities, with the formula currently used to move from distance to score
   2) simplify the code, removing the bound checker that's not necessary anymore
   3) refactor here and there to be in line with the simplification
   4) refactor of NeighborQueue to clearly state when it's a MIN_HEAP or 
MAX_HEAP, now debugging is much easier and understanding the HNSW code is much 
more intuitive


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to