[ 
https://issues.apache.org/jira/browse/LUCENE-10593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17558141#comment-17558141
 ] 

Alessandro Benedetti commented on LUCENE-10593:
-----------------------------------------------

Recent performance tests in the Pull Request.
There's no evidence of slowing down, so this refactor seems good to go to me.
Functional tests are all green.

Planning to continue discussions and merge next week.

> VectorSimilarityFunction reverse removal
> ----------------------------------------
>
>                 Key: LUCENE-10593
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10593
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Alessandro Benedetti
>            Priority: Major
>              Labels: vector-based-search
>
> org.apache.lucene.index.VectorSimilarityFunction#EUCLIDEAN similarity behaves 
> in an opposite way in comparison to the other similarities:
> A higher similarity score means higher distance, for this reason, has been 
> marked with "reversed" and a function is present to map from the similarity 
> to a score (where higher means closer, like in all other similarities.)
> Having this counterintuitive behavior with no apparent explanation I could 
> find(please correct me if I am wrong) brings a lot of nasty side effects for 
> the code readability, especially when combined with the NeighbourQueue that 
> has a "reversed" itself.
> In addition, it complicates also the usage of the pattern:
> Result Queue -> MIN HEAP
> Candidate Queue -> MAX HEAP
> In HNSW searchers.
> The proposal in my Pull Request aims to:
> 1) the Euclidean similarity just returns the score, in line with the other 
> similarities, with the formula currently used to move from distance to score
> 2) simplify the code, removing the bound checker that's not necessary anymore
> 3) refactor here and there to be in line with the simplification
> 4) refactor of NeighborQueue to clearly state when it's a MIN_HEAP or 
> MAX_HEAP, now debugging is much easier and understanding the HNSW code is 
> much more intuitive



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to