vigyasharma commented on PR #13202:
URL: https://github.com/apache/lucene/pull/13202#issuecomment-2062496104
Backported to `branch_9x`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
kaivalnp commented on PR #13202:
URL: https://github.com/apache/lucene/pull/13202#issuecomment-2055680667
FYI I opened #13285 to add the same timeout support to
`AbstractVectorSimilarityQuery`
--
This is an automated message from the Apache Git Service.
To respond to the message, please l
benwtrent commented on PR #13202:
URL: https://github.com/apache/lucene/pull/13202#issuecomment-2038084746
@kaivalnp I too have had this fail, but it is not deterministic. I created
https://github.com/apache/lucene/issues/13272 to track it.
I can mute the test while we wait for the fi
kaivalnp commented on PR #13202:
URL: https://github.com/apache/lucene/pull/13202#issuecomment-2037932211
I see a couple of recent failures related to the test added here:
1. https://github.com/apache/lucene/actions/runs/8557766498/job/23450804532
2. https://github.com/apache/lucene/act
vigyasharma merged PR #13202:
URL: https://github.com/apache/lucene/pull/13202
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene
vigyasharma commented on PR #13202:
URL: https://github.com/apache/lucene/pull/13202#issuecomment-2032687387
> > `TimeLimitingBulkScorer` already optimizes for timeout check frequency
outside of `QueryTimeout` impl
>
> Ahh nice catch! You mean something like:
>
> ```java
> /
kaivalnp commented on code in PR #13202:
URL: https://github.com/apache/lucene/pull/13202#discussion_r1547738734
##
lucene/join/src/java/org/apache/lucene/search/join/DiversifyingChildrenFloatKnnVectorQuery.java:
##
@@ -100,8 +102,15 @@ protected TopDocs exactSearch(LeafReaderCo
kaivalnp commented on PR #13202:
URL: https://github.com/apache/lucene/pull/13202#issuecomment-2031861001
> `TimeLimitingBulkScorer` already optimizes for timeout check frequency
outside of `QueryTimeout` impl
Ahh nice catch! You mean something like:
```java
// counter is an
vigyasharma commented on code in PR #13202:
URL: https://github.com/apache/lucene/pull/13202#discussion_r1546839548
##
lucene/join/src/java/org/apache/lucene/search/join/DiversifyingChildrenFloatKnnVectorQuery.java:
##
@@ -100,8 +102,15 @@ protected TopDocs exactSearch(LeafReade
vigyasharma commented on code in PR #13202:
URL: https://github.com/apache/lucene/pull/13202#discussion_r1546839548
##
lucene/join/src/java/org/apache/lucene/search/join/DiversifyingChildrenFloatKnnVectorQuery.java:
##
@@ -100,8 +102,15 @@ protected TopDocs exactSearch(LeafReade
vigyasharma commented on PR #13202:
URL: https://github.com/apache/lucene/pull/13202#issuecomment-2030543275
I think this PR's changes are okay to be merged. @kaivalnp: can you please
resolve the version conflicts and add a changes entry, and I can merge it in.
--
This is an automated mes
vigyasharma commented on PR #13202:
URL: https://github.com/apache/lucene/pull/13202#issuecomment-2030540106
> Perhaps this can be configured by the end-user themselves, by making
actual timeout checks after every TK number of calls, according to acceptable
latency / accuracy tradeoffs?
kaivalnp commented on PR #13202:
URL: https://github.com/apache/lucene/pull/13202#issuecomment-2025776234
> I am guessing the `QueryTimeout` object has to do some global syncing to
determine the current runtime to check
I have used the default
[`QueryTimeoutImpl`](https://github.com/
benwtrent commented on PR #13202:
URL: https://github.com/apache/lucene/pull/13202#issuecomment-2024967034
> unit vectors, 100 dimensions, indexed first 1M vectors,
@kaivalnp AH, ok, only 100 dimensions, this explains the overly large impact
that checking for timeout had, as there is
kaivalnp commented on PR #13202:
URL: https://github.com/apache/lucene/pull/13202#issuecomment-2024899453
> What was the dimension count & dataset for your performance testing
It was the enwiki dataset
(https://home.apache.org/~sokolov/enwiki-20120502-lines-1k-100d.vec), unit
vectors
benwtrent commented on PR #13202:
URL: https://github.com/apache/lucene/pull/13202#issuecomment-2022583444
Looking at the benchmarking, we are adding a 5% overhead to all vector
operations when using float32. As vector operations get faster (consider
hamming distance with exploring more vec
kaivalnp commented on code in PR #13202:
URL: https://github.com/apache/lucene/pull/13202#discussion_r1540166440
##
lucene/join/src/java/org/apache/lucene/search/join/DiversifyingChildrenFloatKnnVectorQuery.java:
##
@@ -100,8 +102,15 @@ protected TopDocs exactSearch(LeafReaderCo
kaivalnp commented on code in PR #13202:
URL: https://github.com/apache/lucene/pull/13202#discussion_r1540162745
##
lucene/core/src/test/org/apache/lucene/search/TestKnnByteVectorQuery.java:
##
@@ -102,14 +103,34 @@ public void testVectorEncodingMismatch() throws
IOException {
kaivalnp commented on code in PR #13202:
URL: https://github.com/apache/lucene/pull/13202#discussion_r1540033526
##
lucene/core/src/java/org/apache/lucene/search/TimeLimitingKnnCollectorManager.java:
##
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) un
vigyasharma commented on code in PR #13202:
URL: https://github.com/apache/lucene/pull/13202#discussion_r1539995871
##
lucene/core/src/test/org/apache/lucene/search/TestKnnByteVectorQuery.java:
##
@@ -102,14 +103,34 @@ public void testVectorEncodingMismatch() throws
IOException
kaivalnp commented on PR #13202:
URL: https://github.com/apache/lucene/pull/13202#issuecomment-2020534763
> Separately, should we deprecate `TimeLimitingCollector` ? It doesn't use
`QueryTimeout` and I don't think we're using it anywhere.
Created #13220 to discuss this
--
This is a
kaivalnp commented on code in PR #13202:
URL: https://github.com/apache/lucene/pull/13202#discussion_r1537438862
##
lucene/core/src/java/org/apache/lucene/search/TimeLimitingKnnCollectorManager.java:
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) un
vigyasharma commented on code in PR #13202:
URL: https://github.com/apache/lucene/pull/13202#discussion_r1536974998
##
lucene/core/src/java/org/apache/lucene/search/TimeLimitingKnnCollectorManager.java:
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
vigyasharma commented on code in PR #13202:
URL: https://github.com/apache/lucene/pull/13202#discussion_r1536969204
##
lucene/core/src/java/org/apache/lucene/search/TimeLimitingKnnCollectorManager.java:
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
vigyasharma commented on code in PR #13202:
URL: https://github.com/apache/lucene/pull/13202#discussion_r1536968149
##
lucene/core/src/java/org/apache/lucene/search/TimeLimitingKnnCollectorManager.java:
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
kaivalnp commented on PR #13202:
URL: https://github.com/apache/lucene/pull/13202#issuecomment-2017007823
Thanks for the review @vigyasharma!
> Apart from those, one divergent behavior is that we won't be raising some
form of `TimeExceededException` when we timeout and terminate the s
kaivalnp commented on code in PR #13202:
URL: https://github.com/apache/lucene/pull/13202#discussion_r1536943966
##
lucene/core/src/java/org/apache/lucene/search/TimeLimitingKnnCollectorManager.java:
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) un
vigyasharma commented on PR #13202:
URL: https://github.com/apache/lucene/pull/13202#issuecomment-2016947960
Separately, should we deprecate `TimeLimitingCollector` ? It doesn't use
`QueryTimeout` and I don't think we're using it anywhere.
--
This is an automated message from the Apache G
vigyasharma commented on PR #13202:
URL: https://github.com/apache/lucene/pull/13202#issuecomment-2016947180
> ### baseline with `null` timeout
> ```
> 0.965 1.10 100 50 16 100 16750 1.00
post-filter
> 0.992 2.82 100 400 16
vigyasharma commented on code in PR #13202:
URL: https://github.com/apache/lucene/pull/13202#discussion_r1536908711
##
lucene/core/src/java/org/apache/lucene/search/TimeLimitingKnnCollectorManager.java:
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
kaivalnp commented on PR #13202:
URL: https://github.com/apache/lucene/pull/13202#issuecomment-2016595864
Exactly! This issue is also visible from the above benchmarks, where the
`baseline` (having no early termination in `#rewrite`) performs the entire
graph search (same number of nodes vi
vigyasharma commented on PR #13202:
URL: https://github.com/apache/lucene/pull/13202#issuecomment-2016566261
> If we had set a timeout, and HNSW searches turned out too expensive -- we
would not return results even after the search completes (because the timeout
is [checked before attemptin
kaivalnp commented on PR #13202:
URL: https://github.com/apache/lucene/pull/13202#issuecomment-2016087745
### baseline with `null` timeout
```
0.965 1.10 100 50 16 100 16750 1.00
post-filter
0.992 2.82 100 400 16 10
kaivalnp opened a new pull request, #13202:
URL: https://github.com/apache/lucene/pull/13202
### Description
#927 added timeout support to
[`IndexSearcher#search`](https://github.com/apache/lucene/blob/75e1ebc4501733a6a9857858b3fd9c30304234d0/lucene/core/src/java/org/apache/lucene/sea
34 matches
Mail list logo