[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12183: Make some heavy query rewrites concurrent

via GitHub Wed, 13 Sep 2023 02:17:14 -0700


shubhamvishu commented on code in PR #12183:
URL: https://github.com/apache/lucene/pull/12183#discussion_r1324225960



##########
lucene/core/src/java/org/apache/lucene/index/TermStates.java:
##########
@@ -86,19 +93,58 @@ public TermStates(
    * @param needsStats if {@code true} then all leaf contexts will be visited 
up-front to collect
    *     term statistics. Otherwise, the {@link TermState} objects will be 
built only when requested
    */
-  public static TermStates build(IndexReaderContext context, Term term, 
boolean needsStats)
+  public static TermStates build(IndexSearcher indexSearcher, Term term, 
boolean needsStats)
       throws IOException {
-    assert context != null && context.isTopLevel;
+    IndexReaderContext context = indexSearcher.getTopReaderContext();
+    assert context != null;
     final TermStates perReaderTermState = new TermStates(needsStats ? null : 
term, context);
     if (needsStats) {
-      for (final LeafReaderContext ctx : context.leaves()) {
-        // if (DEBUG) System.out.println("  r=" + leaves[i].reader);
-        TermsEnum termsEnum = loadTermsEnum(ctx, term);
-        if (termsEnum != null) {
-          final TermState termState = termsEnum.termState();
-          // if (DEBUG) System.out.println("    found");
-          perReaderTermState.register(
-              termState, ctx.ord, termsEnum.docFreq(), 
termsEnum.totalTermFreq());
+      Executor executor = indexSearcher.getExecutor();
+      if (executor == null) {
+        executor = Runnable::run;
+      }
+      List<FutureTask<TermStateInfo>> tasks =
+          context.leaves().stream()
+              .map(
+                  ctx ->
+                      new FutureTask<>(
+                          () -> {
+                            TermsEnum termsEnum = loadTermsEnum(ctx, term);
+                            if (termsEnum != null) {
+                              return new TermStateInfo(
+                                  termsEnum.termState(),
+                                  ctx.ord,
+                                  termsEnum.docFreq(),
+                                  termsEnum.totalTermFreq());
+                            }
+                            return null;
+                          }))
+              .toList();
+      for (FutureTask<TermStateInfo> task : tasks) {
+        if (executor instanceof ThreadPoolExecutor pool) {
+          if ((pool.getCorePoolSize() - pool.getActiveCount()) <= 1) {
+            task.run();

Review Comment:
   > I recently removed another instanceof check done against the executor, as 
well as the conditional offloading to the executor. I am thinking that we 
should try and not to go back to a similar mechanism and figure out a long-term 
solution.
   
   Yes I'm aware about that change and complete agree its not the ideal way but 
maybe something that could unblock this PR till there is a concrete solution to 
this?
   
   > I caught up with previous discussions and I believe that the suggestion 
that was made was to " run tasks in the current thread if called from a thread 
of the pool". I don't think this conditional achieves that
   
   Yes I did try making this but there seems no straightforward way or API to 
know we are running on an executor thread. Maybe we could use 
`Thread.currentThread().getStackTrace()` to check if we earlier had a call to 
`run` from the executor?



##########
lucene/core/src/java/org/apache/lucene/index/TermStates.java:
##########
@@ -86,19 +93,58 @@ public TermStates(
    * @param needsStats if {@code true} then all leaf contexts will be visited 
up-front to collect
    *     term statistics. Otherwise, the {@link TermState} objects will be 
built only when requested
    */
-  public static TermStates build(IndexReaderContext context, Term term, 
boolean needsStats)
+  public static TermStates build(IndexSearcher indexSearcher, Term term, 
boolean needsStats)
       throws IOException {
-    assert context != null && context.isTopLevel;
+    IndexReaderContext context = indexSearcher.getTopReaderContext();
+    assert context != null;
     final TermStates perReaderTermState = new TermStates(needsStats ? null : 
term, context);
     if (needsStats) {
-      for (final LeafReaderContext ctx : context.leaves()) {
-        // if (DEBUG) System.out.println("  r=" + leaves[i].reader);
-        TermsEnum termsEnum = loadTermsEnum(ctx, term);
-        if (termsEnum != null) {
-          final TermState termState = termsEnum.termState();
-          // if (DEBUG) System.out.println("    found");
-          perReaderTermState.register(
-              termState, ctx.ord, termsEnum.docFreq(), 
termsEnum.totalTermFreq());
+      Executor executor = indexSearcher.getExecutor();
+      if (executor == null) {
+        executor = Runnable::run;
+      }
+      List<FutureTask<TermStateInfo>> tasks =
+          context.leaves().stream()
+              .map(
+                  ctx ->
+                      new FutureTask<>(
+                          () -> {
+                            TermsEnum termsEnum = loadTermsEnum(ctx, term);
+                            if (termsEnum != null) {
+                              return new TermStateInfo(
+                                  termsEnum.termState(),
+                                  ctx.ord,
+                                  termsEnum.docFreq(),
+                                  termsEnum.totalTermFreq());
+                            }
+                            return null;
+                          }))
+              .toList();
+      for (FutureTask<TermStateInfo> task : tasks) {
+        if (executor instanceof ThreadPoolExecutor pool) {
+          if ((pool.getCorePoolSize() - pool.getActiveCount()) <= 1) {
+            task.run();
+          } else {
+            executor.execute(task);
+          }
+        } else {
+          executor.execute(task);
+        }
+      }
+      for (FutureTask<TermStateInfo> task : tasks) {
+        try {
+          TermStateInfo wrapper = task.get();
+          if (wrapper != null) {
+            perReaderTermState.register(
+                wrapper.getState(),
+                wrapper.getOrdinal(),
+                wrapper.getDocFreq(),
+                wrapper.getTotalTermFreq());
+          }
+        } catch (InterruptedException e) {
+          throw new ThreadInterruptedException(e);
+        } catch (ExecutionException e) {
+          throw new RuntimeException(e.getCause());

Review Comment:
   Makes sense. I'll try to reuse the TaskExecutor instead.



##########
lucene/CHANGES.txt:
##########
@@ -232,11 +172,6 @@ Other
 * GITHUB#12410: Refactor vectorization support (split provider from 
implementation classes).
   (Uwe Schindler, Chris Hegarty)
 
-* GITHUB#12428: Replace consecutive close() calls and close() calls with null 
checks with IOUtils.close().
-  (Shubham Chaudhary)
-
-* GITHUB#12512: Remove unused variable in BKDWriter. (Chao Zhang)
-

Review Comment:
   Ahh thanks! I'll fix it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12183: Make some heavy query rewrites concurrent

Reply via email to