kaivalnp commented on code in PR #12160:
URL: https://github.com/apache/lucene/pull/12160#discussion_r1120780564


##########
lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java:
##########
@@ -73,17 +77,48 @@ public Query rewrite(IndexSearcher indexSearcher) throws 
IOException {
               .build();
       Query rewritten = indexSearcher.rewrite(booleanQuery);
       filterWeight = indexSearcher.createWeight(rewritten, 
ScoreMode.COMPLETE_NO_SCORES, 1f);
+    } else {
+      filterWeight = null;
     }
 
-    for (LeafReaderContext ctx : reader.leaves()) {
-      TopDocs results = searchLeaf(ctx, filterWeight);
-      if (ctx.docBase > 0) {
-        for (ScoreDoc scoreDoc : results.scoreDocs) {
-          scoreDoc.doc += ctx.docBase;
-        }
-      }
-      perLeafResults[ctx.ord] = results;
-    }
+    List<FutureTask<TopDocs>> tasks =
+        reader.leaves().stream()
+            .map(
+                ctx ->
+                    new FutureTask<>(
+                        () -> {
+                          try {
+                            TopDocs results = searchLeaf(ctx, filterWeight);
+                            if (ctx.docBase > 0) {
+                              for (ScoreDoc scoreDoc : results.scoreDocs) {
+                                scoreDoc.doc += ctx.docBase;
+                              }
+                            }
+                            return results;
+                          } catch (IOException e) {
+                            throw new UncheckedIOException(e);
+                          }
+                        }))
+            .toList();
+
+    Executor executor = 
Objects.requireNonNullElse(indexSearcher.getExecutor(), Runnable::run);
+    SliceExecutor sliceExecutor = new SliceExecutor(executor);
+    sliceExecutor.invokeAll(tasks);

Review Comment:
   > We shouldn't bother with any parallelism if indexSearcher.getExecutor() == 
null || reader.leaves().size() <= 1. Its a simple if branch that allows us to 
remove all the overhead associated with parallel rewrites when no parallelism 
can be achieved.
   
   I would totally agree with you here! We shouldn't add overhead for 
non-concurrent executions
   If I understand correctly, you are suggesting to add an `if` block with the 
condition:
   - When it evaluates to `true`, we want to perform search [as 
before](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L78-L86)
   - `else` we perform the search [as per this 
PR](https://github.com/kaivalnp/lucene/blob/concurrent-knn/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L84-L120)
   
   I have tried to implement the changes 
[here](https://github.com/apache/lucene/compare/main...kaivalnp:lucene:concurrent-knn-reduce-overhead).
 I ran some benchmarks for these (with the executor as `null`):
   
   ### enwiki (topK = 100, segment count = 10, executor = null)
   | recall | Sequential | SliceExecutor | ReduceOverhead | nDoc | fanout | 
maxConn | beamWidth |
   |-|-|-|-|-|-|-|-|
   | 0.995 |  0.95 |  0.96 |  0.95 |   10000 |  0 | 16 | 32 |
   | 0.998 |  1.26 |  1.30 |  1.29 |   10000 | 50 | 16 | 32 |
   | 0.998 |  1.05 |  1.07 |  1.07 |   10000 |  0 | 16 | 64 |
   | 0.999 |  1.41 |  1.43 |  1.43 |   10000 | 50 | 16 | 64 |
   | 0.995 |  0.98 |  0.99 |  0.98 |   10000 |  0 | 32 | 32 |
   | 0.998 |  1.31 |  1.33 |  1.34 |   10000 | 50 | 32 | 32 |
   | 0.998 |  0.99 |  1.01 |  1.01 |   10000 |  0 | 32 | 64 |
   | 0.999 |  1.33 |  1.36 |  1.36 |   10000 | 50 | 32 | 64 |
   | 0.987 |  1.70 |  1.70 |  1.71 |  100000 |  0 | 16 | 32 |
   | 0.992 |  2.30 |  2.30 |  2.31 |  100000 | 50 | 16 | 32 |
   | 0.993 |  1.92 |  1.89 |  1.94 |  100000 |  0 | 16 | 64 |
   | 0.996 |  2.63 |  2.65 |  2.64 |  100000 | 50 | 16 | 64 |
   | 0.987 |  1.73 |  1.70 |  1.74 |  100000 |  0 | 32 | 32 |
   | 0.992 |  2.34 |  2.30 |  2.37 |  100000 | 50 | 32 | 32 |
   | 0.994 |  1.96 |  1.92 |  1.98 |  100000 |  0 | 32 | 64 |
   | 0.997 |  2.66 |  2.61 |  2.69 |  100000 | 50 | 32 | 64 |
   | 0.971 |  2.72 |  2.70 |  2.74 | 1000000 |  0 | 16 | 32 |
   | 0.982 |  3.77 |  3.79 |  3.78 | 1000000 | 50 | 16 | 32 |
   | 0.985 |  3.13 |  3.19 |  3.19 | 1000000 |  0 | 16 | 64 |
   | 0.991 |  4.34 |  4.37 |  4.36 | 1000000 | 50 | 16 | 64 |
   | 0.973 |  2.86 |  2.94 |  2.94 | 1000000 |  0 | 32 | 32 |
   | 0.983 |  3.94 |  3.98 |  3.97 | 1000000 | 50 | 32 | 32 |
   | 0.986 |  3.38 |  3.37 |  3.38 | 1000000 |  0 | 32 | 64 |
   | 0.992 |  4.63 |  4.66 |  4.67 | 1000000 | 50 | 32 | 64 |
   
   ### enwiki (topK = 100, segment count = 5, executor = null)
   | recall | Sequential | SliceExecutor | ReduceOverhead | nDoc | fanout | 
maxConn | beamWidth |
   |-|-|-|-|-|-|-|-|
   | 0.991 |  0.59 |  0.61 |  0.59 |   10000 |  0 | 16 | 32 |
   | 0.996 |  0.82 |  0.83 |  0.81 |   10000 | 50 | 16 | 32 |
   | 0.997 |  0.61 |  0.62 |  0.60 |   10000 |  0 | 16 | 64 |
   | 0.999 |  0.88 |  0.88 |  0.86 |   10000 | 50 | 16 | 64 |
   | 0.991 |  0.59 |  0.59 |  0.58 |   10000 |  0 | 32 | 32 |
   | 0.995 |  0.80 |  0.81 |  0.80 |   10000 | 50 | 32 | 32 |
   | 0.997 |  0.64 |  0.64 |  0.62 |   10000 |  0 | 32 | 64 |
   | 0.999 |  0.87 |  0.88 |  0.89 |   10000 | 50 | 32 | 64 |
   | 0.978 |  1.09 |  1.08 |  1.08 |  100000 |  0 | 16 | 32 |
   | 0.987 |  1.29 |  1.32 |  1.34 |  100000 | 50 | 16 | 32 |
   | 0.989 |  1.10 |  1.09 |  1.10 |  100000 |  0 | 16 | 64 |
   | 0.994 |  1.48 |  1.49 |  1.46 |  100000 | 50 | 16 | 64 |
   | 0.977 |  0.98 |  0.99 |  0.98 |  100000 |  0 | 32 | 32 |
   | 0.987 |  1.33 |  1.35 |  1.34 |  100000 | 50 | 32 | 32 |
   | 0.989 |  1.13 |  1.14 |  1.13 |  100000 |  0 | 32 | 64 |
   | 0.994 |  1.55 |  1.55 |  1.53 |  100000 | 50 | 32 | 64 |
   | 0.957 |  1.48 |  1.52 |  1.49 | 1000000 |  0 | 16 | 32 |
   | 0.972 |  2.03 |  2.08 |  2.04 | 1000000 | 50 | 16 | 32 |
   | 0.976 |  1.70 |  1.73 |  1.71 | 1000000 |  0 | 16 | 64 |
   | 0.985 |  2.42 |  2.45 |  2.47 | 1000000 | 50 | 16 | 64 |
   | 0.959 |  1.67 |  1.65 |  1.66 | 1000000 |  0 | 32 | 32 |
   | 0.974 |  2.13 |  2.15 |  2.16 | 1000000 | 50 | 32 | 32 |
   | 0.978 |  1.89 |  1.84 |  1.89 | 1000000 |  0 | 32 | 64 |
   | 0.987 |  2.52 |  2.53 |  2.55 | 1000000 | 50 | 32 | 64 |
   
   ### enwiki (topK = 100, segment count = 1, executor = null)
   | recall | Sequential | SliceExecutor | ReduceOverhead | nDoc | fanout | 
maxConn | beamWidth |
   |-|-|-|-|-|-|-|-|
   | 0.941 |  0.22 |  0.21 |  0.24 |   10000 |  0 | 16 | 32 |
   | 0.970 |  0.24 |  0.24 |  0.25 |   10000 | 50 | 16 | 32 |
   | 0.965 |  0.20 |  0.19 |  0.20 |   10000 |  0 | 16 | 64 |
   | 0.984 |  0.28 |  0.27 |  0.28 |   10000 | 50 | 16 | 64 |
   | 0.941 |  0.18 |  0.17 |  0.18 |   10000 |  0 | 32 | 32 |
   | 0.970 |  0.24 |  0.23 |  0.23 |   10000 | 50 | 32 | 32 |
   | 0.966 |  0.20 |  0.20 |  0.20 |   10000 |  0 | 32 | 64 |
   | 0.985 |  0.28 |  0.27 |  0.26 |   10000 | 50 | 32 | 64 |
   | 0.909 |  0.27 |  0.27 |  0.27 |  100000 |  0 | 16 | 32 |
   | 0.945 |  0.38 |  0.36 |  0.37 |  100000 | 50 | 16 | 32 |
   | 0.944 |  0.32 |  0.30 |  0.30 |  100000 |  0 | 16 | 64 |
   | 0.969 |  0.43 |  0.41 |  0.42 |  100000 | 50 | 16 | 64 |
   | 0.914 |  0.28 |  0.28 |  0.29 |  100000 |  0 | 32 | 32 |
   | 0.948 |  0.39 |  0.38 |  0.38 |  100000 | 50 | 32 | 32 |
   | 0.949 |  0.30 |  0.30 |  0.32 |  100000 |  0 | 32 | 64 |
   | 0.972 |  0.44 |  0.41 |  0.40 |  100000 | 50 | 32 | 64 |
   | 0.870 |  0.35 |  0.34 |  0.35 | 1000000 |  0 | 16 | 32 |
   | 0.911 |  0.49 |  0.48 |  0.47 | 1000000 | 50 | 16 | 32 |
   | 0.913 |  0.40 |  0.40 |  0.41 | 1000000 |  0 | 16 | 64 |
   | 0.945 |  0.55 |  0.55 |  0.56 | 1000000 | 50 | 16 | 64 |
   | 0.881 |  0.38 |  0.39 |  0.38 | 1000000 |  0 | 32 | 32 |
   | 0.919 |  0.52 |  0.52 |  0.52 | 1000000 | 50 | 32 | 32 |
   | 0.923 |  0.45 |  0.45 |  0.46 | 1000000 |  0 | 32 | 64 |
   | 0.954 |  0.62 |  0.62 |  0.61 | 1000000 | 50 | 32 | 64 |
   
   There are a few places where it gives some speedup, but this seems to be too 
low (Note that there is also some logic duplication 
[here](https://github.com/kaivalnp/lucene/blob/concurrent-knn-reduce-overhead/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L88-L93)
 and 
[here](https://github.com/kaivalnp/lucene/blob/concurrent-knn-reduce-overhead/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L103-L108),
 which we would want to avoid, maybe by wrapping it in a callable. I tried that 
out locally and it was performing similar to worse)
   
   In the absence of an executor, we are setting it to `Runnable::run`, which 
performs the same tasks sequentially. My guess would be that its overhead is 
much lower compared to the search tasks, and IMO the readability earlier 
outweighs the separate `if` block
   
   Please let me know what you feel / if you had something else in mind?
   
   Edit: Sorry, links in this comment are now broken because they pointed to 
specific lines at the time of writing. Now that the underlying branch is 
updated, links point to unrelated places



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to