kaivalnp closed pull request #12820: Re-use information from graph traversal
during exact search
URL: https://github.com/apache/lucene/pull/12820
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
kaivalnp commented on PR #12820:
URL: https://github.com/apache/lucene/pull/12820#issuecomment-1959568841
Thanks for checking @benwtrent!
We primarily improve cases of using a high topK + a selective filter (good
rate of fallback, large number of duplicate computations). I notice \~5%
benwtrent commented on PR #12820:
URL: https://github.com/apache/lucene/pull/12820#issuecomment-1957923215
I have done some more benchmarking and there isn't really a significant
improvement. This is over 500k, 1024 vectors. Getting the nearest 500
neighbors.
Baseline
```
late
github-actions[bot] commented on PR #12820:
URL: https://github.com/apache/lucene/pull/12820#issuecomment-1880899839
This PR has not had activity in the past 2 weeks, labeling it as stale. If
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you
for your contributi
benwtrent commented on code in PR #12820:
URL: https://github.com/apache/lucene/pull/12820#discussion_r1419374932
##
lucene/core/src/java/org/apache/lucene/search/AbstractKnnCollector.java:
##
@@ -66,4 +69,19 @@ public final int k() {
@Override
public abstract TopDocs to
benwtrent commented on code in PR #12820:
URL: https://github.com/apache/lucene/pull/12820#discussion_r1419363698
##
lucene/core/src/java/org/apache/lucene/search/AbstractKnnCollector.java:
##
@@ -66,4 +69,19 @@ public final int k() {
@Override
public abstract TopDocs to
benwtrent commented on code in PR #12820:
URL: https://github.com/apache/lucene/pull/12820#discussion_r1419363698
##
lucene/core/src/java/org/apache/lucene/search/AbstractKnnCollector.java:
##
@@ -66,4 +69,19 @@ public final int k() {
@Override
public abstract TopDocs to
kaivalnp commented on PR #12820:
URL: https://github.com/apache/lucene/pull/12820#issuecomment-1819930817
Yes, the restrictive filter will cause more fallbacks to `#exactSearch`, and
the high `topK` will mean more visitation = saving more on duplicate work
> So we see a 5-10% improvem
kaivalnp commented on code in PR #12820:
URL: https://github.com/apache/lucene/pull/12820#discussion_r1399830085
##
lucene/join/src/java/org/apache/lucene/search/join/DiversifyingChildrenByteKnnVectorQuery.java:
##
@@ -158,59 +95,4 @@ public int hashCode() {
result = 31 * r
kaivalnp commented on code in PR #12820:
URL: https://github.com/apache/lucene/pull/12820#discussion_r1399829891
##
lucene/core/src/java/org/apache/lucene/search/KnnFloatVectorQuery.java:
##
@@ -76,11 +73,11 @@ public KnnFloatVectorQuery(String field, float[] target,
int k, Que
kaivalnp commented on code in PR #12820:
URL: https://github.com/apache/lucene/pull/12820#discussion_r1399829461
##
lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java:
##
@@ -171,33 +181,23 @@ protected TopDocs exactSearch(LeafReaderContext context,
DocId
kaivalnp commented on code in PR #12820:
URL: https://github.com/apache/lucene/pull/12820#discussion_r1399829328
##
lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java:
##
@@ -155,14 +159,20 @@ protected boolean match(int doc) {
}
}
- protected a
kaivalnp commented on code in PR #12820:
URL: https://github.com/apache/lucene/pull/12820#discussion_r1399829104
##
lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java:
##
@@ -109,32 +109,36 @@ private TopDocs getLeafResults(LeafReaderContext ctx,
Weight f
vigyasharma commented on code in PR #12820:
URL: https://github.com/apache/lucene/pull/12820#discussion_r1399533915
##
lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java:
##
@@ -155,14 +159,20 @@ protected boolean match(int doc) {
}
}
- protecte
kaivalnp commented on PR #12820:
URL: https://github.com/apache/lucene/pull/12820#issuecomment-1816720340
Thanks @jpountz! I realised something from your comment:
My current implementation has a flaw, because it cannot handle the
[`OrdinalTranslatedKnnCollector`](https://github.com/ka
jpountz commented on PR #12820:
URL: https://github.com/apache/lucene/pull/12820#issuecomment-1815358559
This is an interesting idea. Ideally we would figure out up-front whether
it's best to use the graph or not, but I can also imagine that we can't always
make the right decision there, so
kaivalnp opened a new pull request, #12820:
URL: https://github.com/apache/lucene/pull/12820
### Description
In KNN queries with a pre-filter, we first perform an approximate graph
search and then
[fallback](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/l
17 matches
Mail list logo