searchivarius commented on issue #12342: URL: https://github.com/apache/lucene/issues/12342#issuecomment-1631795367
thank you @jmazanec15 : there's also an unpublished paper where benchmarked HNSW for maximum inner product search and it was just fine. In my thesis, I benchmarked SW-graph (which is pretty much HNSW when it comes to peculiarities of handling the inner product search) using an inner-product like similarity (fusion of BM25 and MODEL1 scores) and it was fine. [See the black asterisk run in Figure 3.2](http://boytsov.info/pubs/thesis_boytsov.pdf). Moreover, HNSW and SW-graph for tested with non-metric similarities (see again my thesis and references therein) as well as Yury Malkov's HNSW paper. These methods established SOTA results as well. There is also an extract from the thesis (published separately) [that focuses specifically on search with non-metric similarities. Again, things just work.](https://arxiv.org/pdf/1910.03534.pdf). One may wonder why, right? I think for real datasets the quirky distances don't deviate from the Euclidean distances all that much so the minimal set of geometric properties required for graph based retrieval is preserved (and no I don't think the triangle inequality is required). Specifically, for the inner product search the outcomes are pretty close (in many cases) to the outcomes where the inner product search is replaced with cosine similarity? Why? Because the magnitude of vectors doesn't change all that much. That said, there are of course degenerate cases (I know one, but embedding models don't produce such weirdness) where HNSW won't work with MIPS (or rather recall will be low). However, I am not aware of any realistic one. If you have some interesting examples of real datasets where direct application of HNSW/SW-graph fails, I would love to have a look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org