msokolov commented on PR #14823:
URL: https://github.com/apache/lucene/pull/14823#issuecomment-3006267677

   The right default value for beam width is a little hard to say. Values 
reported in public vary wildly. Try asking Gemini what is the right default and 
it refuses to be pinned down! Unfortunately it depends - on the application 
setup (does it have a fancy second-stage ranker, or do we rely on the vector 
scores directly for ranking), on the vector data set (its dimensions, the size 
of the index).  We're finding that "high quality" vectors might require lower 
values for these exploration parameters, and applications that gather lots of 
semantic hits and then later re-rank them might be less sensitive to lower 
recall and can tolerate cheaper indexing settings.  You can see that these 
sources recommend values like 16, 64, 128, and 256: 
   
   * 
https://opensearch.org/blog/a-practical-guide-to-selecting-hnsw-hyperparameters/
 
   * https://www.pinecone.io/learn/series/faiss/hnsw/
   
   Maybe a good rule of thumb would be to pick the place where the slope of 
these latency/recall curves is 45 degrees? But of course that will depend on 
the units of the chart.
   
   My view is this is something that, at least as a default, we could detune a 
bit.  Maybe we could retreat to 64?  But anyway this issue is about merge 
policy segment counts I think so we should probably open a new issue if we want 
to change that


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to