[ https://issues.apache.org/jira/browse/LUCENE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385408#comment-17385408 ]
Julie Tibshirani commented on LUCENE-10015: ------------------------------------------- I think it makes sense to keep the ability to configure the similarity function at the field level. I don't see it as a very 'expert' option -- based on what the vectors represent and how they've been processed, it's necessary to use the right similarity function to obtain good results. Also unlike 'maxConn' and 'beamWidth' (which were specific to our HNSW implementation), it's a concept that makes sense across NN algorithms generally. To the best of my knowledge, many NN algorithms can handle the full set of common similarity functions (Euclidean, dot product, cosine). In case it's helpful context: currently we only support Euclidean and cosine distance, which is technically redundant. For cosine similarity, users could normalize the vectors to unit length and use Euclidean. But I'm assuming we'll add support for inner product too, which seems very popular and cannot be expressed in terms of Euclidean distance. The FAISS library currently supports only Euclidean distance and inner product. > Remove VectorValues.SimilarityFunction, remove NONE > --------------------------------------------------- > > Key: LUCENE-10015 > URL: https://issues.apache.org/jira/browse/LUCENE-10015 > Project: Lucene - Core > Issue Type: Task > Reporter: Robert Muir > Priority: Blocker > Fix For: 9.0 > > Time Spent: 40m > Remaining Estimate: 0h > > This stuff is HNSW-implementation specific. It can be moved to a codec > parameter. > The NONE option should be removed: it just makes the codec more complex. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org