[I] Seeding HNSW Search [lucene]

via GitHub Tue, 06 Aug 2024 02:33:23 -0700


seanmacavaney opened a new issue, #13634:
URL: https://github.com/apache/lucene/issues/13634

### Description

In some vector search cases, users may already know some documents that are
likely related to a query. Let's support seeding HNSW's scoring stage with
these documents, rather than using HNSW's hierarchical stage.

An example use case is hybrid search, where both a traditional and vector
search are performed. The top results from the traditional search are likely
reasonable seeds for the vector search. Even when not performing hybrid search,
traditional matching can often be faster than traversing the hierarchy, which
can be used to speed up the vector search process (up to 2x faster for the same
effectiveness), as was demonstrated in [this
article](https://arxiv.org/abs/2307.16779) (full disclosure: I'm an author of
the article).

This enhancement proposes adding a `seed` query, alongside the existing
`filter` query, to the KNN query classes. The results of this query will be fed
into `HnswGraphSearcher`, and ultimately replace the graph entry points
[here](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#L103-L106).
If the seed query fails (e.g., keywords do not match any documents), the
approach will fall back onto the existing hierarchical search process.

Pull request to follow.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[I] Seeding HNSW Search [lucene]

Reply via email to