mikemccand commented on code in PR #12183: URL: https://github.com/apache/lucene/pull/12183#discussion_r1235212562
########## lucene/core/src/java/org/apache/lucene/index/TermStates.java: ########## @@ -86,19 +92,58 @@ public TermStates( * @param needsStats if {@code true} then all leaf contexts will be visited up-front to collect * term statistics. Otherwise, the {@link TermState} objects will be built only when requested */ - public static TermStates build(IndexReaderContext context, Term term, boolean needsStats) + public static TermStates build( + IndexSearcher indexSearcher, IndexReaderContext context, Term term, boolean needsStats) throws IOException { assert context != null && context.isTopLevel; final TermStates perReaderTermState = new TermStates(needsStats ? null : term, context); if (needsStats) { - for (final LeafReaderContext ctx : context.leaves()) { - // if (DEBUG) System.out.println(" r=" + leaves[i].reader); - TermsEnum termsEnum = loadTermsEnum(ctx, term); - if (termsEnum != null) { - final TermState termState = termsEnum.termState(); - // if (DEBUG) System.out.println(" found"); - perReaderTermState.register( - termState, ctx.ord, termsEnum.docFreq(), termsEnum.totalTermFreq()); + Executor executor = indexSearcher.getExecutor(); + boolean isShutdown = false; + if (executor instanceof ExecutorService) { + isShutdown = ((ExecutorService) executor).isShutdown(); + } + if (executor != null && isShutdown == false) { + // build term states concurrently + List<FutureTask<Integer>> tasks = + context.leaves().stream() + .map( + ctx -> + new FutureTask<>( + () -> { + TermsEnum termsEnum = loadTermsEnum(ctx, term); + if (termsEnum != null) { + final TermState termState = termsEnum.termState(); + perReaderTermState.register( + termState, + ctx.ord, + termsEnum.docFreq(), + termsEnum.totalTermFreq()); + } + return 0; + })) + .toList(); + for (FutureTask<Integer> task : tasks) { + executor.execute(task); + } + for (FutureTask<Integer> task : tasks) { + try { + task.get(); + } catch (InterruptedException | ExecutionException e) { + throw new RuntimeException(e.getMessage()); + } + } Review Comment: > Another aspect that gives a bit of headache is this blocking wait while the tasks are completing. Well, remember that it is effectively already a blocking (single threaded) implementation already today, so this added wait in this fork/join approach is really no different. Still, I agree it would be wonderful to explore a more async approach overall, with continuations to run when each async task completes. This would be a natural way to distribute the IO required for executing costly queries. But I don't think we need to try to do this in this PR -- baby steps, progress not perfection. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org