jpountz commented on PR #13124: URL: https://github.com/apache/lucene/pull/13124#issuecomment-1961602616
Maybe some of these things are too ambitious, but ideally I'd like it to work this way. `ConcurrentMergeScheduler` already tracks a `maxMergeCount` which controls the max number of running merges and a `maxThreadCount` that tracks the max number of threads that merges may use at most. Ideally I'd like `maxThreadCount` to include both threads used for inter-merge concurrency and intra-merge concurrency. So this is similar to your first suggestion except that I'm bounding the total number of threads to `maxThreadCount` rather than `maxThreadCount + maxMergeCount`. Intra-merge concurrency would take advantage of the fact that there will sometimes be fewer active merges than threads to enable intra-merge concurrency. E.g. we could have a pool of threads for intra-merge concurrency that would try to ensure that its number of active threads is always less than or equals to `max(0, maxThreadCount - mergeThreads.size())`. For instance `Executor#execute` could be implemented such that it runs the runnable in the current thread if the number of active merges plus the number of active threads in the intra-merge thread pool is greater than or equal to `maxThreadCount`. Otherwise it would fork to the intra-merge thread pool. Concurrent merging for vectors wants to know the number of available workers today, but maybe we can change the logic (like you suggested) to split the doc ID space into some number of slices, e.g. max(128, maxDoc / 2^16), and sequentially send these slices to `Executor#execute` (sometimes running in the same thread, sometime forked to the intra-merge threadpool), except the last one that would be forced to run in the current thread (like we used to do in `IndexSearcher` until recently). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org