[GitHub] [lucene] javanna commented on a diff in pull request #12183: Make some heavy query rewrites concurrent

via GitHub Tue, 20 Jun 2023 08:03:58 -0700


javanna commented on code in PR #12183:
URL: https://github.com/apache/lucene/pull/12183#discussion_r1235407541



##########
lucene/core/src/java/org/apache/lucene/index/TermStates.java:
##########
@@ -86,19 +92,58 @@ public TermStates(
    * @param needsStats if {@code true} then all leaf contexts will be visited 
up-front to collect
    *     term statistics. Otherwise, the {@link TermState} objects will be 
built only when requested
    */
-  public static TermStates build(IndexReaderContext context, Term term, 
boolean needsStats)
+  public static TermStates build(
+      IndexSearcher indexSearcher, IndexReaderContext context, Term term, 
boolean needsStats)
       throws IOException {
     assert context != null && context.isTopLevel;
     final TermStates perReaderTermState = new TermStates(needsStats ? null : 
term, context);
     if (needsStats) {
-      for (final LeafReaderContext ctx : context.leaves()) {
-        // if (DEBUG) System.out.println("  r=" + leaves[i].reader);
-        TermsEnum termsEnum = loadTermsEnum(ctx, term);
-        if (termsEnum != null) {
-          final TermState termState = termsEnum.termState();
-          // if (DEBUG) System.out.println("    found");
-          perReaderTermState.register(
-              termState, ctx.ord, termsEnum.docFreq(), 
termsEnum.totalTermFreq());
+      Executor executor = indexSearcher.getExecutor();
+      boolean isShutdown = false;
+      if (executor instanceof ExecutorService) {
+        isShutdown = ((ExecutorService) executor).isShutdown();
+      }
+      if (executor != null && isShutdown == false) {
+        // build term states concurrently
+        List<FutureTask<Integer>> tasks =
+            context.leaves().stream()
+                .map(
+                    ctx ->
+                        new FutureTask<>(
+                            () -> {
+                              TermsEnum termsEnum = loadTermsEnum(ctx, term);
+                              if (termsEnum != null) {
+                                final TermState termState = 
termsEnum.termState();
+                                perReaderTermState.register(
+                                    termState,
+                                    ctx.ord,
+                                    termsEnum.docFreq(),
+                                    termsEnum.totalTermFreq());
+                              }
+                              return 0;
+                            }))
+                .toList();
+        for (FutureTask<Integer> task : tasks) {
+          executor.execute(task);

Review Comment:
   Ok, so in this case you would want to have more slices than you'd have for 
the collection? Sounds like each usecase would require different sizing of 
slices which is in fact not possible today. I can see how you'd not use the 
slices then here. But I'd still worry that there may be too many threads needed 
for a single rewrite. Is that not a concern?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] javanna commented on a diff in pull request #12183: Make some heavy query rewrites concurrent

Reply via email to