searchivarius commented on issue #12342:
URL: https://github.com/apache/lucene/issues/12342#issuecomment-1631795367

   thank you @jmazanec15 : there's also an unpublished paper where benchmarked 
HNSW for maximum inner product search and it was just fine. In my thesis, I 
benchmarked SW-graph (which is pretty much HNSW when it comes to peculiarities 
of handling the inner product search) using an inner-product like similarity 
(fusion of BM25 and MODEL1 scores) and it was fine. [See the black asterisk run 
in Figure 3.2](http://boytsov.info/pubs/thesis_boytsov.pdf).
   
   
   Moreover, HNSW and SW-graph for tested with non-metric similarities (see 
again my thesis and references therein) as well as Yury Malkov's HNSW paper. 
These methods established SOTA results as well.
   
   There is also an extract from the thesis (published separately) [that 
focuses specifically on search with non-metric similarities. Again, things just 
work.](https://arxiv.org/pdf/1910.03534.pdf).
   
   One may wonder why, right? I think for real datasets the quirky distances 
don't deviate from the Euclidean distances all that much so the minimal set of 
geometric properties required for graph based retrieval is preserved (and no I 
don't think the triangle inequality is required).
   
   Specifically, for the inner product search the outcomes are pretty close (in 
many cases) to the outcomes where the inner product search is replaced with 
cosine similarity? Why? Because the magnitude of vectors doesn't change all 
that much. 
   
   That said, there are of course degenerate cases (I know one, but embedding 
models don't produce such weirdness) where HNSW won't work with MIPS (or rather 
recall will be low). However, I am not aware of any realistic one. If you have 
some interesting examples of real datasets where direct application of 
HNSW/SW-graph fails, I would love to have a look.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to