kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-2126824340
Yes @alessandrobenedetti that is correct -- some result may be missed if
nodes along its path from the entry node score below the result threshold (but
still higher than a traversal thr
alessandrobenedetti commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-2126740768
Hi @kaivalnp, thanks for this contribution!
My question is why do we have two thresholds, one for grap traversal (used
to decide if it's worth exploring a candidate nei
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1907958716
This feature will ship with Lucene 9.10
I'm not sure when that will be released, though [I
see](https://lucene.apache.org/core/corenews.html) \~2-4 months between
previous minor
junqiu-lei commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1907152593
Hi, do we have any scheduled release date for this exciting feature?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub a
epotyom commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1851062374
I see random test failures that could be related to this change:
```
> java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds
for length 123
> at
benwtrent merged PR #12679:
URL: https://github.com/apache/lucene/pull/12679
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.a
kaivalnp commented on code in PR #12679:
URL: https://github.com/apache/lucene/pull/12679#discussion_r1419688833
##
lucene/CHANGES.txt:
##
@@ -167,7 +167,10 @@ API Changes
New Features
-
-(No changes)
+
+* GITHUB#12679: Add support for similarity-based ve
benwtrent commented on code in PR #12679:
URL: https://github.com/apache/lucene/pull/12679#discussion_r1419666959
##
lucene/CHANGES.txt:
##
@@ -167,7 +167,10 @@ API Changes
New Features
-
-(No changes)
+
+* GITHUB#12679: Add support for similarity-based v
kaivalnp commented on code in PR #12679:
URL: https://github.com/apache/lucene/pull/12679#discussion_r1419660341
##
lucene/CHANGES.txt:
##
@@ -167,7 +167,10 @@ API Changes
New Features
-
-(No changes)
+
+* GITHUB#12679: Add support for similarity-based ve
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1846106597
Thanks for all the help here @benwtrent !
> Could you add changes for Lucene 9.10?
Added an entry under "New Features" (also added one of my teammates along
with whom this
benwtrent commented on code in PR #12679:
URL: https://github.com/apache/lucene/pull/12679#discussion_r1419656899
##
lucene/CHANGES.txt:
##
@@ -167,7 +167,10 @@ API Changes
New Features
-
-(No changes)
+
+* GITHUB#12679: Add support for similarity-based v
benwtrent commented on code in PR #12679:
URL: https://github.com/apache/lucene/pull/12679#discussion_r1419656899
##
lucene/CHANGES.txt:
##
@@ -167,7 +167,10 @@ API Changes
New Features
-
-(No changes)
+
+* GITHUB#12679: Add support for similarity-based v
kaivalnp commented on code in PR #12679:
URL: https://github.com/apache/lucene/pull/12679#discussion_r1411285972
##
lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java:
##
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) und
kaivalnp commented on code in PR #12679:
URL: https://github.com/apache/lucene/pull/12679#discussion_r1411286512
##
lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java:
##
Review Comment:
Added now
--
This is an automated message from the Apache Gi
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1834591675
Thanks @benwtrent! I also simplified the queries:
I realized that the API may be difficult to use in the current state (we are
leaving two parameters - `traversalSimilarity` and `
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1812956627
> could you test on cohere with Max-inner product?
Thanks, the gist was really helpful and gave some files including normalized
and un-normalized vectors. I assume that since you
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1812941899
> You still need to score the vectors to realize that they are in the
iteration set or not
Right, I meant that we need not score all *other* vectors to determine if
the vector it
benwtrent commented on code in PR #12679:
URL: https://github.com/apache/lucene/pull/12679#discussion_r1393228359
##
lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java:
##
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) un
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1811057058
Keeping the `visitLimit` = 0 (immediately fallback to lazy iterator) we
expect an exact search to be performed (and `recall` = 1) as soon as the first
node is visited (`numVisited` = 1)
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1809743395
### Benchmark Setup
Sharing my benchmark setup for reproducibility in [this
branch](https://github.com/kaivalnp/lucene/tree/similarity-benchmark) (see
[this
commit](https://gith
kaivalnp commented on code in PR #12679:
URL: https://github.com/apache/lucene/pull/12679#discussion_r1389857835
##
lucene/core/src/java/org/apache/lucene/search/VectorSimilarityCollector.java:
##
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under on
benwtrent commented on code in PR #12679:
URL: https://github.com/apache/lucene/pull/12679#discussion_r1389741654
##
lucene/core/src/java/org/apache/lucene/search/VectorSimilarityCollector.java:
##
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under o
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1806196196
Summary of new changes:
1. Refactor into a more appropriate query
- Move away from `AbstractKnnVectorQuery` to take advantage of inherent
independence of segment-level results
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1780186180
Thank you! I'll try to incorporate earlier suggestions in the meanwhile
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
benwtrent commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1779866529
@kaivalnp I have been busy doing other things. I hope to look into this in
the next week or so.
--
This is an automated message from the Apache Git Service.
To respond to the message
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1779796454
Hi @benwtrent! Curious to hear if you've been able to reproduce the
benchmark?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
shubhamvishu commented on code in PR #12679:
URL: https://github.com/apache/lucene/pull/12679#discussion_r1367106152
##
lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java:
##
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1769299867
Here is the gist of my benchmark:
https://gist.github.com/kaivalnp/79808017ed7666214540213d1e2a21cf
I'm calculating the baseline / individual results as "count of vectors above
t
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1769235109
Thanks for running this @benwtrent!
I just had a couple of questions:
1. What was your baseline in the test? If the baseline / goal is to "get the
K-Nearest Neighbors", then th
benwtrent commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1769218762
@kaivalnp I see the issue with my test, you are specifically testing
"post-filtering" on the top values, not just getting the top10 k. I understand
my issue.
Could you post your
benwtrent commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1769123216
OK, I tried testing with KnnGraphTester.
I indexed 100_000 normalized Cohere vectors (768 dims).
With regular knn, recall@10:
```
recall latency nDocfa
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1768958493
Sorry for the confusion, I tried renaming the branch from
`radius-based-vector-search` to `similarity-based-vector-search` and the PR
closed automatically. I guess I'm stuck with this b
kaivalnp closed pull request #12679: Add support for similarity-based vector
searches
URL: https://github.com/apache/lucene/pull/12679
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific co
33 matches
Mail list logo