[GitHub] [lucene] uschindler commented on a diff in pull request #12183: Make TermStates#build concurrent

via GitHub Wed, 13 Sep 2023 02:44:15 -0700


uschindler commented on code in PR #12183:
URL: https://github.com/apache/lucene/pull/12183#discussion_r1324259855



##########
lucene/core/src/java/org/apache/lucene/index/TermStates.java:
##########
@@ -86,19 +93,58 @@ public TermStates(
    * @param needsStats if {@code true} then all leaf contexts will be visited 
up-front to collect
    *     term statistics. Otherwise, the {@link TermState} objects will be 
built only when requested
    */
-  public static TermStates build(IndexReaderContext context, Term term, 
boolean needsStats)
+  public static TermStates build(IndexSearcher indexSearcher, Term term, 
boolean needsStats)
       throws IOException {
-    assert context != null && context.isTopLevel;
+    IndexReaderContext context = indexSearcher.getTopReaderContext();
+    assert context != null;
     final TermStates perReaderTermState = new TermStates(needsStats ? null : 
term, context);
     if (needsStats) {
-      for (final LeafReaderContext ctx : context.leaves()) {
-        // if (DEBUG) System.out.println("  r=" + leaves[i].reader);
-        TermsEnum termsEnum = loadTermsEnum(ctx, term);
-        if (termsEnum != null) {
-          final TermState termState = termsEnum.termState();
-          // if (DEBUG) System.out.println("    found");
-          perReaderTermState.register(
-              termState, ctx.ord, termsEnum.docFreq(), 
termsEnum.totalTermFreq());
+      Executor executor = indexSearcher.getExecutor();
+      if (executor == null) {
+        executor = Runnable::run;
+      }
+      List<FutureTask<TermStateInfo>> tasks =
+          context.leaves().stream()
+              .map(
+                  ctx ->
+                      new FutureTask<>(
+                          () -> {
+                            TermsEnum termsEnum = loadTermsEnum(ctx, term);
+                            if (termsEnum != null) {
+                              return new TermStateInfo(
+                                  termsEnum.termState(),
+                                  ctx.ord,
+                                  termsEnum.docFreq(),
+                                  termsEnum.totalTermFreq());
+                            }
+                            return null;
+                          }))
+              .toList();
+      for (FutureTask<TermStateInfo> task : tasks) {
+        if (executor instanceof ThreadPoolExecutor pool) {
+          if ((pool.getCorePoolSize() - pool.getActiveCount()) <= 1) {
+            task.run();

Review Comment:
   If you really want to do this use StackWalker. But take care that it might 
require additional privileges if used in a wrong way.
   
   The main problem with instance of checks is that there are many other 
implementations of Executor that may deadlock in similar ways. Think also about 
virtual threads in java 21!
   
   I think the only way is to go away with the stupid old Executor abstraction 
and use the ForkJoin framework of JDK. It is capable of forking from inside 
threads which were already forked and can handle that without deadlocks. 
   
   We need to change IndexSearcher to use ForkJoinPool instead of plain 
Executor: 
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ForkJoinPool.html
   Then all tasks should be no runables but ForkJoinTasks: 
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ForkJoinTask.html
   
   I thinks rewriting the concurrency in IndexSearcher und Fork/Join should be 
a separate PR.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler commented on a diff in pull request #12183: Make TermStates#build concurrent

Reply via email to