mikemccand commented on issue #13699:
URL: https://github.com/apache/lucene/issues/13699#issuecomment-2374865202

   > This is a similar conversation around folks adding `efSearch` as a 
parameter. Or possibly custom kNN collector behavior (your own early stopping 
logic...).
   
   Is `efSearch` (a Faiss parameter) like the index-time `beamWidth`, but it 
applies to search-time?  Like it's the size N of the priority queue holding 
most promising vectors so far, such that N can be bigger than the eventual K 
that are finally returned?  Lucene does not expose this today?  Maybe it should 
be called `searchBeamWidth` if this logic is right :)
   
   So the workaround for users now is to just ask for a larger top N and 
discard all but the top K in the end?  
[`knnPerfTest`](https://github.com/mikemccand/luceneutil/blob/main/src/python/knnPerfTest.py)
 (for benchmarking Lucene HNSW/KNN in luceneutil) seems to take this approach 
(it calls it `fanout`).
   
   Another example might be `interval` from the `MultiLeafKnnCollector` -- 
crazy expert level parameter.  Though that one is more a tradeoff of 
concurrency vs total CPU cost.  Greediness seems more application / vector / 
model dependent and quite fundamental to this algorithm.
   
   > A more opaque "hnsw knn search parameters" might be a better way, with a 
simple interface that is accepted in the queries. But adding an individual 
parameter for all the things you can do during query time with graph 
exploration would be complicated.
   
   +1 it'd be nice to find some simple way to allow "expert" tunables to be set 
rather than one at a time poking them through the APIs ad-hoc.  Though, lacking 
such a general mechanism also shouldn't block progress...
   
   > FWIW, this greediness is honestly focused on graph based indices only. If 
we ever did another type of index, its behavior would change.
   
   But let's design for today?  We shouldn't let a possible future path ("maybe 
we some day expose non-graph-based approximate KNN") limit (too much) what we 
expose today?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to