Re: [I] A multi-tenant ConcurrentMergeScheduler [lucene]

via GitHub Mon, 16 Jun 2025 15:41:48 -0700


yaser-aj commented on issue #13883:
URL: https://github.com/apache/lucene/issues/13883#issuecomment-2978379030


   Me, @lukewilner, @atharvkashyap, and @N624-debu are students from Carnegie 
Mellon University, and we’ll be working on this issue as part of a mentored 
summer course focused on collaboration in open-source software. Our mentors are 
@mikemccand and @vigyasharma. We’ll be drafting a plan and submitting PRs over 
the next few weeks. Looking forward to collaborating!
   
   **Our understanding of the problem:**
   
   Every `IndexWriter` within a running JVM initiates one 
`ConcurrentMergeScheduler` object that, based on the selected `MergePolicy`, 
uses available resources to merge segments into a single `Merge` object. The 
problem is that when there are multiple `IndexWriter` objects, different 
`ConcurrentMergeScheduler` objects are initiated and all of them blindly use 
available compute resources for the running JVM, without regard to each other. 
This causes excessive resources (RAM, CPU cores, and I/O resources) usage, way 
beyond what the user have allocated for merging.
   
   There has to be one `MultiTenantConcurrentMergeScheduler` object that 
organizes how all `ConcurrentMergeScheduler` objects operate and divide 
resources wisely across them. It should handle addition and deletion of 
`ConcurrentMergeScheduler` objects on the go, optimally without the need to 
restart all `ConcurrentMergeScheduler` objects every time the number of 
`ConcurrentMergeScheduler` objects changes.
   
   **Thinking out loud:**
   
   Maybe we can use 
[setMaxMergesAndThreads](https://javadoc.io/static/org.apache.lucene/lucene-core/10.2.1/org/apache/lucene/index/ConcurrentMergeScheduler.html#setMaxMergesAndThreads(int,int))
 inside the singleton `MultiTenantConcurrentMergeScheduler` object while merges 
are happening across all `ConcurrentMergeScheduler` objects. This update can 
happen whenever a new `ConcurrentMergeScheduler` is added or deleted. It should 
wisely divide the allocated resources across all active 
`ConcurrentMergeScheduler` objects, giving more merge threads to needy 
`ConcurrentMergeScheduler` objects and less to no threads at all to the idle 
`ConcurrentMergeScheduler` objects. We have to come up with an efficient way to 
decide how to distribute threads based on (1) the continuously changing needs 
of each `ConcurrentMergeScheduler` object and (2) number of active 
`ConcurrentMergeScheduler` objects.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] A multi-tenant ConcurrentMergeScheduler [lucene]

Reply via email to