yaser-aj commented on issue #13883: URL: https://github.com/apache/lucene/issues/13883#issuecomment-2978379030
Me, @lukewilner, @atharvkashyap, and @N624-debu are students from Carnegie Mellon University, and we’ll be working on this issue as part of a mentored summer course focused on collaboration in open-source software. Our mentors are @mikemccand and @vigyasharma. We’ll be drafting a plan and submitting PRs over the next few weeks. Looking forward to collaborating! **Our understanding of the problem:** Every `IndexWriter` within a running JVM initiates one `ConcurrentMergeScheduler` object that, based on the selected `MergePolicy`, uses available resources to merge segments into a single `Merge` object. The problem is that when there are multiple `IndexWriter` objects, different `ConcurrentMergeScheduler` objects are initiated and all of them blindly use available compute resources for the running JVM, without regard to each other. This causes excessive resources (RAM, CPU cores, and I/O resources) usage, way beyond what the user have allocated for merging. There has to be one `MultiTenantConcurrentMergeScheduler` object that organizes how all `ConcurrentMergeScheduler` objects operate and divide resources wisely across them. It should handle addition and deletion of `ConcurrentMergeScheduler` objects on the go, optimally without the need to restart all `ConcurrentMergeScheduler` objects every time the number of `ConcurrentMergeScheduler` objects changes. **Thinking out loud:** Maybe we can use [setMaxMergesAndThreads](https://javadoc.io/static/org.apache.lucene/lucene-core/10.2.1/org/apache/lucene/index/ConcurrentMergeScheduler.html#setMaxMergesAndThreads(int,int)) inside the singleton `MultiTenantConcurrentMergeScheduler` object while merges are happening across all `ConcurrentMergeScheduler` objects. This update can happen whenever a new `ConcurrentMergeScheduler` is added or deleted. It should wisely divide the allocated resources across all active `ConcurrentMergeScheduler` objects, giving more merge threads to needy `ConcurrentMergeScheduler` objects and less to no threads at all to the idle `ConcurrentMergeScheduler` objects. We have to come up with an efficient way to decide how to distribute threads based on (1) the continuously changing needs of each `ConcurrentMergeScheduler` object and (2) number of active `ConcurrentMergeScheduler` objects. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org