Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-18 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1768862106 > the Collector is full by flagging "incomplete" (I think this is possible) once a threshold is reached Do you mean that we return incomplete results? Instead, maybe we can

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-18 Thread via GitHub
benwtrent commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1768737424 The results: https://github.com/apache/lucene/pull/12679#issuecomment-1766995337 Are astounding! I will try and replicate with Lucene Util. The numbers seem almost too goo

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-18 Thread via GitHub
benwtrent commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1768734871 > Something like a lazy-loading iterator, where we perform vector comparisons and determine whether a doc matches on #advance? I think @kaivalnp the thing to do would be to say

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-17 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1767022898 > stronger guarantees on the worst-case memory usage Totally agreed @jpountz! It is very easy to go wrong in the new API, specially if the user passes a low threshold (high radius

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-17 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1766995337 ### Benchmarks Using the vector file from https://home.apache.org/~sokolov/enwiki-20120502-lines-1k-100d.vec (enwiki dataset, unit vectors, 100 dimensions) The setup was 1

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-17 Thread via GitHub
benwtrent commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1766983182 > I think of it as finding all results within a high-dimensional circle / sphere / equivalent, dot-product, cosine, etc. don't really follow that same idea as you point out. I w

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-17 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1766834111 Thanks for the review @shubhamvishu! Addressed some of the comments above > Is it right to call it a radius-based search here? I think of it as finding all results within a

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-17 Thread via GitHub
kaivalnp commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1362476143 ## lucene/core/src/java/org/apache/lucene/search/RnnCollector.java: ## @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + *

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-17 Thread via GitHub
kaivalnp commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1362474206 ## lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java: ## @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-17 Thread via GitHub
kaivalnp commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1362475149 ## lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java: ## @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-17 Thread via GitHub
kaivalnp commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1362472568 ## lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java: ## @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-17 Thread via GitHub
jpountz commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1766795903 Thanks for explaining, I had overlooked how the `Integer.MAX_VALUE` was used indeed. I'm still interested in figuring out if we can have stronger guarantees on the worst-case memory usag

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-17 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1766741617 > If I read correctly, this query ends up calling LeafReader#searchNearestNeighbors with k=Integer.MAX_VALUE No, we're calling the [new API](https://github.com/apache/lucene/blob

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-17 Thread via GitHub
jpountz commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1765975335 If I read correctly, this query ends up calling `LeafReader#searchNearestNeighbors` with k=Integer.MAX_VALUE, which will not only run in O(maxDoc) time but also use O(maxDoc) memory. I d

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-14 Thread via GitHub
shubhamvishu commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1359624510 ## lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java: ## Review Comment: Lets add some tests for these going forward? -- This is

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-14 Thread via GitHub
shubhamvishu commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1763170985 Thanks for adding this @kaivalnp! The idea makes sense to me, looking forward to the benchmarks results. I left some minor comments. Sharing some thoughts below : 1. Is it ri

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-14 Thread via GitHub
shubhamvishu commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1359606449 ## lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java: ## @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[PR] Add support for radius-based vector searches [lucene]

2023-10-14 Thread via GitHub
kaivalnp opened a new pull request, #12679: URL: https://github.com/apache/lucene/pull/12679 ### Description Background in #12579 Add support for getting "all vectors within a radius" as opposed to getting the "topK closest vectors" in the current system ### Consideratio