[ https://issues.apache.org/jira/browse/LUCENE-9614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236727#comment-17236727 ]
Michael Sokolov commented on LUCENE-9614: ----------------------------------------- OK, thought about this a bit, and I guess I see the point a little better. This query is weird because if (say) we were to add some new vectors to the index, suddenly a vector that previously matched might no longer match. I guess I have been thinking of a Query as a convenience for plugging in to the typical scoring / execution framework provided by IndexSearcher. Let me sketch out the use case I have in mind, because I'm not sure how we would handle it in the non-Query implementation(s). We'd like to be able to blend matches derived from postings (full text search) along with matches derived from vectors, using some kind of scoring function that balances vector scores and text relevance scores. Both kinds of matches also need to satisfy other constraints, embodied in a Query. If we present KNN matches as a Query, I think this can all be done by the Collectors in the usual way, but if we have a different API, say something on IndexSearcher, or a static method on a KNN class, then that blending will require its own custom implementation - I think? > Implement KNN Query > ------------------- > > Key: LUCENE-9614 > URL: https://issues.apache.org/jira/browse/LUCENE-9614 > Project: Lucene - Core > Issue Type: New Feature > Reporter: Michael Sokolov > Priority: Major > > Now we have a vector index format, and one vector indexing/KNN search > implementation, but the interface is low-level: you can search across a > single segment only. We would like to expose a Query implementation. > Initially, we want to support a usage where the KnnVectorQuery selects the > k-nearest neighbors without regard to any other constraints, and these can > then be filtered as part of an enclosing Boolean or other query. > Later we will want to explore some kind of filtering *while* performing > vector search, or a re-entrant search process that can yield further results. > Because of the nature of knn search (all documents having any vector value > match), it is more like a ranking than a filtering operation, and it doesn't > really make sense to provide an iterator interface that can be merged in the > usual way, in docid order, skipping ahead. It's not yet clear how to satisfy > a query that is "k nearest neighbors satsifying some arbitrary Query", at > least not without realizing a complete bitset for the Query. But this is for > a later issue; *this* issue is just about performing the knn search in > isolation, computing a set of (some given) K nearest neighbors, and providing > an iterator over those. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org