mikemccand commented on issue #13699: URL: https://github.com/apache/lucene/issues/13699#issuecomment-2374865202
> This is a similar conversation around folks adding `efSearch` as a parameter. Or possibly custom kNN collector behavior (your own early stopping logic...). Is `efSearch` (a Faiss parameter) like the index-time `beamWidth`, but it applies to search-time? Like it's the size N of the priority queue holding most promising vectors so far, such that N can be bigger than the eventual K that are finally returned? Lucene does not expose this today? Maybe it should be called `searchBeamWidth` if this logic is right :) So the workaround for users now is to just ask for a larger top N and discard all but the top K in the end? [`knnPerfTest`](https://github.com/mikemccand/luceneutil/blob/main/src/python/knnPerfTest.py) (for benchmarking Lucene HNSW/KNN in luceneutil) seems to take this approach (it calls it `fanout`). Another example might be `interval` from the `MultiLeafKnnCollector` -- crazy expert level parameter. Though that one is more a tradeoff of concurrency vs total CPU cost. Greediness seems more application / vector / model dependent and quite fundamental to this algorithm. > A more opaque "hnsw knn search parameters" might be a better way, with a simple interface that is accepted in the queries. But adding an individual parameter for all the things you can do during query time with graph exploration would be complicated. +1 it'd be nice to find some simple way to allow "expert" tunables to be set rather than one at a time poking them through the APIs ad-hoc. Though, lacking such a general mechanism also shouldn't block progress... > FWIW, this greediness is honestly focused on graph based indices only. If we ever did another type of index, its behavior would change. But let's design for today? We shouldn't let a possible future path ("maybe we some day expose non-graph-based approximate KNN") limit (too much) what we expose today? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org