Re: [PR] Add support for similarity-based vector searches [lucene]

2024-05-23 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-2126824340 Yes @alessandrobenedetti that is correct -- some result may be missed if nodes along its path from the entry node score below the result threshold (but still higher than a traversal thr

Re: [PR] Add support for similarity-based vector searches [lucene]

2024-05-23 Thread via GitHub
alessandrobenedetti commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-2126740768 Hi @kaivalnp, thanks for this contribution! My question is why do we have two thresholds, one for grap traversal (used to decide if it's worth exploring a candidate nei

Re: [PR] Add support for similarity-based vector searches [lucene]

2024-01-24 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1907958716 This feature will ship with Lucene 9.10 I'm not sure when that will be released, though [I see](https://lucene.apache.org/core/corenews.html) \~2-4 months between previous minor

Re: [PR] Add support for similarity-based vector searches [lucene]

2024-01-23 Thread via GitHub
junqiu-lei commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1907152593 Hi, do we have any scheduled release date for this exciting feature? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-12-11 Thread via GitHub
epotyom commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1851062374 I see random test failures that could be related to this change: ``` > java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 123 > at

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-12-11 Thread via GitHub
benwtrent merged PR #12679: URL: https://github.com/apache/lucene/pull/12679 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-12-07 Thread via GitHub
kaivalnp commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1419688833 ## lucene/CHANGES.txt: ## @@ -167,7 +167,10 @@ API Changes New Features - -(No changes) + +* GITHUB#12679: Add support for similarity-based ve

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-12-07 Thread via GitHub
benwtrent commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1419666959 ## lucene/CHANGES.txt: ## @@ -167,7 +167,10 @@ API Changes New Features - -(No changes) + +* GITHUB#12679: Add support for similarity-based v

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-12-07 Thread via GitHub
kaivalnp commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1419660341 ## lucene/CHANGES.txt: ## @@ -167,7 +167,10 @@ API Changes New Features - -(No changes) + +* GITHUB#12679: Add support for similarity-based ve

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-12-07 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1846106597 Thanks for all the help here @benwtrent ! > Could you add changes for Lucene 9.10? Added an entry under "New Features" (also added one of my teammates along with whom this

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-12-07 Thread via GitHub
benwtrent commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1419656899 ## lucene/CHANGES.txt: ## @@ -167,7 +167,10 @@ API Changes New Features - -(No changes) + +* GITHUB#12679: Add support for similarity-based v

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-12-07 Thread via GitHub
benwtrent commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1419656899 ## lucene/CHANGES.txt: ## @@ -167,7 +167,10 @@ API Changes New Features - -(No changes) + +* GITHUB#12679: Add support for similarity-based v

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-11-30 Thread via GitHub
kaivalnp commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1411285972 ## lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java: ## @@ -0,0 +1,246 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-11-30 Thread via GitHub
kaivalnp commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1411286512 ## lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java: ## Review Comment: Added now -- This is an automated message from the Apache Gi

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-11-30 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1834591675 Thanks @benwtrent! I also simplified the queries: I realized that the API may be difficult to use in the current state (we are leaving two parameters - `traversalSimilarity` and `

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-11-15 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1812956627 > could you test on cohere with Max-inner product? Thanks, the gist was really helpful and gave some files including normalized and un-normalized vectors. I assume that since you

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-11-15 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1812941899 > You still need to score the vectors to realize that they are in the iteration set or not Right, I meant that we need not score all *other* vectors to determine if the vector it

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-11-14 Thread via GitHub
benwtrent commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1393228359 ## lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java: ## @@ -0,0 +1,246 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-11-14 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1811057058 Keeping the `visitLimit` = 0 (immediately fallback to lazy iterator) we expect an exact search to be performed (and `recall` = 1) as soon as the first node is visited (`numVisited` = 1)

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-11-14 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1809743395 ### Benchmark Setup Sharing my benchmark setup for reproducibility in [this branch](https://github.com/kaivalnp/lucene/tree/similarity-benchmark) (see [this commit](https://gith

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-11-10 Thread via GitHub
kaivalnp commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1389857835 ## lucene/core/src/java/org/apache/lucene/search/VectorSimilarityCollector.java: ## @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under on

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-11-10 Thread via GitHub
benwtrent commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1389741654 ## lucene/core/src/java/org/apache/lucene/search/VectorSimilarityCollector.java: ## @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-11-10 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1806196196 Summary of new changes: 1. Refactor into a more appropriate query - Move away from `AbstractKnnVectorQuery` to take advantage of inherent independence of segment-level results

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-10-25 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1780186180 Thank you! I'll try to incorporate earlier suggestions in the meanwhile -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-10-25 Thread via GitHub
benwtrent commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1779866529 @kaivalnp I have been busy doing other things. I hope to look into this in the next week or so. -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-10-25 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1779796454 Hi @benwtrent! Curious to hear if you've been able to reproduce the benchmark? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-10-20 Thread via GitHub
shubhamvishu commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1367106152 ## lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java: ## @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-10-18 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1769299867 Here is the gist of my benchmark: https://gist.github.com/kaivalnp/79808017ed7666214540213d1e2a21cf I'm calculating the baseline / individual results as "count of vectors above t

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-10-18 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1769235109 Thanks for running this @benwtrent! I just had a couple of questions: 1. What was your baseline in the test? If the baseline / goal is to "get the K-Nearest Neighbors", then th

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-10-18 Thread via GitHub
benwtrent commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1769218762 @kaivalnp I see the issue with my test, you are specifically testing "post-filtering" on the top values, not just getting the top10 k. I understand my issue. Could you post your

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-10-18 Thread via GitHub
benwtrent commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1769123216 OK, I tried testing with KnnGraphTester. I indexed 100_000 normalized Cohere vectors (768 dims). With regular knn, recall@10: ``` recall latency nDocfa

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-10-18 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1768958493 Sorry for the confusion, I tried renaming the branch from `radius-based-vector-search` to `similarity-based-vector-search` and the PR closed automatically. I guess I'm stuck with this b

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-10-18 Thread via GitHub
kaivalnp closed pull request #12679: Add support for similarity-based vector searches URL: https://github.com/apache/lucene/pull/12679 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co