[GitHub] [lucene] jpountz commented on a diff in pull request #12183: Make TermStates#build concurrent

via GitHub Thu, 21 Sep 2023 04:30:52 -0700


jpountz commented on code in PR #12183:
URL: https://github.com/apache/lucene/pull/12183#discussion_r1332900264



##########
lucene/core/src/java/org/apache/lucene/index/TermStates.java:
##########
@@ -86,19 +90,48 @@ public TermStates(
    * @param needsStats if {@code true} then all leaf contexts will be visited 
up-front to collect
    *     term statistics. Otherwise, the {@link TermState} objects will be 
built only when requested
    */
-  public static TermStates build(IndexReaderContext context, Term term, 
boolean needsStats)
+  public static TermStates build(IndexSearcher indexSearcher, Term term, 
boolean needsStats)
       throws IOException {
-    assert context != null && context.isTopLevel;
+    IndexReaderContext context = indexSearcher.getTopReaderContext();
+    assert context != null;
     final TermStates perReaderTermState = new TermStates(needsStats ? null : 
term, context);
     if (needsStats) {
-      for (final LeafReaderContext ctx : context.leaves()) {
-        // if (DEBUG) System.out.println("  r=" + leaves[i].reader);
-        TermsEnum termsEnum = loadTermsEnum(ctx, term);
-        if (termsEnum != null) {
-          final TermState termState = termsEnum.termState();
-          // if (DEBUG) System.out.println("    found");
-          perReaderTermState.register(
-              termState, ctx.ord, termsEnum.docFreq(), 
termsEnum.totalTermFreq());
+      TaskExecutor taskExecutor = indexSearcher.getTaskExecutor();
+      if (taskExecutor != null) {
+        // build the term states concurrently
+        List<TaskExecutor.Task<TermStateInfo>> tasks =
+            context.leaves().stream()
+                .map(
+                    ctx ->
+                        taskExecutor.createTask(
+                            () -> {
+                              TermsEnum termsEnum = loadTermsEnum(ctx, term);
+                              if (termsEnum != null) {
+                                return new TermStateInfo(
+                                    termsEnum.termState(),
+                                    ctx.ord,
+                                    termsEnum.docFreq(),
+                                    termsEnum.totalTermFreq());
+                              }
+                              return null;
+                            }))
+                .toList();
+        List<TaskExecutor.Task<TermStateInfo>> taskList = new 
ArrayList<>(tasks);

Review Comment:
   Why do we need to clone the list?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [lucene] jpountz commented on a diff in pull request #12183: Make TermStates#build concurrent

Reply via email to