benwtrent commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1929537735

   > My question is why add this function when it's not that much faster than 
integer dot product?
   
   Because it provides different scores. Integer dot-product doesn't provide 
the same values (angle between vectors) and doesn't work for binary encoded 
data (vs. euclidean bit distance).
   
   Hamming distance is more a like `euclidean`. It is possible to do "hamming 
distance things" now, if users give specifically `[0, 1, 0, 1, 1...]` and use 
`euclidean`, but this has obvious draw backs (8x more vector operations and 
vector dims are 8x bigger).
   
   And before you suggest "lets remove `euclidean` then", they are not 
compatible other than users providing literal `1s/0s`.
   
   > The issue is that folks just want to add, add, add these functions yet 
there are no ways to remove any function from this list ( they will scream 
"bwc" ).
   
   If you are against this & will block it, then we need to provide a clean way 
for users to introduce their own similarities.
   
   I suggested making similarities pluggable in the past, but got shot down.
   
   > A good way to get in a new function would be to actually improve our 
support o&m by removing a horribly performing one such as cosine first. That 
way we are actually improving rather than just piling on more code.
   
   If hamming and cosine were comparable, then sure. But they are not. 
   
   I do agree cosine should probably be removed (not because of hamming 
distance), but because dot_product exists.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to