shubhamvishu commented on code in PR #12183: URL: https://github.com/apache/lucene/pull/12183#discussion_r1324225960
########## lucene/core/src/java/org/apache/lucene/index/TermStates.java: ########## @@ -86,19 +93,58 @@ public TermStates( * @param needsStats if {@code true} then all leaf contexts will be visited up-front to collect * term statistics. Otherwise, the {@link TermState} objects will be built only when requested */ - public static TermStates build(IndexReaderContext context, Term term, boolean needsStats) + public static TermStates build(IndexSearcher indexSearcher, Term term, boolean needsStats) throws IOException { - assert context != null && context.isTopLevel; + IndexReaderContext context = indexSearcher.getTopReaderContext(); + assert context != null; final TermStates perReaderTermState = new TermStates(needsStats ? null : term, context); if (needsStats) { - for (final LeafReaderContext ctx : context.leaves()) { - // if (DEBUG) System.out.println(" r=" + leaves[i].reader); - TermsEnum termsEnum = loadTermsEnum(ctx, term); - if (termsEnum != null) { - final TermState termState = termsEnum.termState(); - // if (DEBUG) System.out.println(" found"); - perReaderTermState.register( - termState, ctx.ord, termsEnum.docFreq(), termsEnum.totalTermFreq()); + Executor executor = indexSearcher.getExecutor(); + if (executor == null) { + executor = Runnable::run; + } + List<FutureTask<TermStateInfo>> tasks = + context.leaves().stream() + .map( + ctx -> + new FutureTask<>( + () -> { + TermsEnum termsEnum = loadTermsEnum(ctx, term); + if (termsEnum != null) { + return new TermStateInfo( + termsEnum.termState(), + ctx.ord, + termsEnum.docFreq(), + termsEnum.totalTermFreq()); + } + return null; + })) + .toList(); + for (FutureTask<TermStateInfo> task : tasks) { + if (executor instanceof ThreadPoolExecutor pool) { + if ((pool.getCorePoolSize() - pool.getActiveCount()) <= 1) { + task.run(); Review Comment: > I recently removed another instanceof check done against the executor, as well as the conditional offloading to the executor. I am thinking that we should try and not to go back to a similar mechanism and figure out a long-term solution. Yes I'm aware about that change and complete agree its not the ideal way but maybe something that could unblock this PR till there is a concrete solution to this? > I caught up with previous discussions and I believe that the suggestion that was made was to " run tasks in the current thread if called from a thread of the pool". I don't think this conditional achieves that Yes I did try making this but there seems no straightforward way or API to know we are running on an executor thread. Maybe we could use `Thread.currentThread().getStackTrace()` to check if we earlier had a call to `run` from the executor? ########## lucene/core/src/java/org/apache/lucene/index/TermStates.java: ########## @@ -86,19 +93,58 @@ public TermStates( * @param needsStats if {@code true} then all leaf contexts will be visited up-front to collect * term statistics. Otherwise, the {@link TermState} objects will be built only when requested */ - public static TermStates build(IndexReaderContext context, Term term, boolean needsStats) + public static TermStates build(IndexSearcher indexSearcher, Term term, boolean needsStats) throws IOException { - assert context != null && context.isTopLevel; + IndexReaderContext context = indexSearcher.getTopReaderContext(); + assert context != null; final TermStates perReaderTermState = new TermStates(needsStats ? null : term, context); if (needsStats) { - for (final LeafReaderContext ctx : context.leaves()) { - // if (DEBUG) System.out.println(" r=" + leaves[i].reader); - TermsEnum termsEnum = loadTermsEnum(ctx, term); - if (termsEnum != null) { - final TermState termState = termsEnum.termState(); - // if (DEBUG) System.out.println(" found"); - perReaderTermState.register( - termState, ctx.ord, termsEnum.docFreq(), termsEnum.totalTermFreq()); + Executor executor = indexSearcher.getExecutor(); + if (executor == null) { + executor = Runnable::run; + } + List<FutureTask<TermStateInfo>> tasks = + context.leaves().stream() + .map( + ctx -> + new FutureTask<>( + () -> { + TermsEnum termsEnum = loadTermsEnum(ctx, term); + if (termsEnum != null) { + return new TermStateInfo( + termsEnum.termState(), + ctx.ord, + termsEnum.docFreq(), + termsEnum.totalTermFreq()); + } + return null; + })) + .toList(); + for (FutureTask<TermStateInfo> task : tasks) { + if (executor instanceof ThreadPoolExecutor pool) { + if ((pool.getCorePoolSize() - pool.getActiveCount()) <= 1) { + task.run(); + } else { + executor.execute(task); + } + } else { + executor.execute(task); + } + } + for (FutureTask<TermStateInfo> task : tasks) { + try { + TermStateInfo wrapper = task.get(); + if (wrapper != null) { + perReaderTermState.register( + wrapper.getState(), + wrapper.getOrdinal(), + wrapper.getDocFreq(), + wrapper.getTotalTermFreq()); + } + } catch (InterruptedException e) { + throw new ThreadInterruptedException(e); + } catch (ExecutionException e) { + throw new RuntimeException(e.getCause()); Review Comment: Makes sense. I'll try to reuse the TaskExecutor instead. ########## lucene/CHANGES.txt: ########## @@ -232,11 +172,6 @@ Other * GITHUB#12410: Refactor vectorization support (split provider from implementation classes). (Uwe Schindler, Chris Hegarty) -* GITHUB#12428: Replace consecutive close() calls and close() calls with null checks with IOUtils.close(). - (Shubham Chaudhary) - -* GITHUB#12512: Remove unused variable in BKDWriter. (Chao Zhang) - Review Comment: Ahh thanks! I'll fix it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org