kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1768862106
> the Collector is full by flagging "incomplete" (I think this is possible)
once a threshold is reached
Do you mean that we return incomplete results?
Instead, maybe we can
benwtrent commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1768737424
The results:
https://github.com/apache/lucene/pull/12679#issuecomment-1766995337
Are astounding! I will try and replicate with Lucene Util.
The numbers seem almost too goo
benwtrent commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1768734871
> Something like a lazy-loading iterator, where we perform vector
comparisons and determine whether a doc matches on #advance?
I think @kaivalnp the thing to do would be to say
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1767022898
> stronger guarantees on the worst-case memory usage
Totally agreed @jpountz! It is very easy to go wrong in the new API,
specially if the user passes a low threshold (high radius
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1766995337
### Benchmarks
Using the vector file from
https://home.apache.org/~sokolov/enwiki-20120502-lines-1k-100d.vec (enwiki
dataset, unit vectors, 100 dimensions)
The setup was 1
benwtrent commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1766983182
> I think of it as finding all results within a high-dimensional circle /
sphere / equivalent,
dot-product, cosine, etc. don't really follow that same idea as you point
out. I w
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1766834111
Thanks for the review @shubhamvishu! Addressed some of the comments above
> Is it right to call it a radius-based search here?
I think of it as finding all results within a
kaivalnp commented on code in PR #12679:
URL: https://github.com/apache/lucene/pull/12679#discussion_r1362476143
##
lucene/core/src/java/org/apache/lucene/search/RnnCollector.java:
##
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ *
kaivalnp commented on code in PR #12679:
URL: https://github.com/apache/lucene/pull/12679#discussion_r1362474206
##
lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java:
##
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
kaivalnp commented on code in PR #12679:
URL: https://github.com/apache/lucene/pull/12679#discussion_r1362475149
##
lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java:
##
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
kaivalnp commented on code in PR #12679:
URL: https://github.com/apache/lucene/pull/12679#discussion_r1362472568
##
lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java:
##
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
jpountz commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1766795903
Thanks for explaining, I had overlooked how the `Integer.MAX_VALUE` was used
indeed. I'm still interested in figuring out if we can have stronger guarantees
on the worst-case memory usag
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1766741617
> If I read correctly, this query ends up calling
LeafReader#searchNearestNeighbors with k=Integer.MAX_VALUE
No, we're calling the [new
API](https://github.com/apache/lucene/blob
jpountz commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1765975335
If I read correctly, this query ends up calling
`LeafReader#searchNearestNeighbors` with k=Integer.MAX_VALUE, which will not
only run in O(maxDoc) time but also use O(maxDoc) memory. I d
shubhamvishu commented on code in PR #12679:
URL: https://github.com/apache/lucene/pull/12679#discussion_r1359624510
##
lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java:
##
Review Comment:
Lets add some tests for these going forward?
--
This is
shubhamvishu commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1763170985
Thanks for adding this @kaivalnp! The idea makes sense to me, looking
forward to the benchmarks results. I left some minor comments. Sharing some
thoughts below :
1. Is it ri
shubhamvishu commented on code in PR #12679:
URL: https://github.com/apache/lucene/pull/12679#discussion_r1359606449
##
lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java:
##
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
kaivalnp opened a new pull request, #12679:
URL: https://github.com/apache/lucene/pull/12679
### Description
Background in #12579
Add support for getting "all vectors within a radius" as opposed to getting
the "topK closest vectors" in the current system
### Consideratio
18 matches
Mail list logo