Re: [I] A multi-tenant ConcurrentMergeScheduler [lucene]

via GitHub Sat, 05 Jul 2025 11:52:06 -0700


mikemccand commented on issue #13883:
URL: https://github.com/apache/lucene/issues/13883#issuecomment-3039675356


   > > Using the same thread pool for indexing and merging. This way if the 
thread pool gets full of merges, this will naturally push back on indexing.
   > 
   > +1 to this - we have a problem today where force-merge-deletes runs way 
too long, blocking additional merges, but indexing continues, and we see 
deletions and overall index size continually growing; it's unhealthy, and maybe 
back-pressure on indexing would help. OTOH we may just be asking force-merge to 
do too much ...
   
   Well this is sort of a self inflicted wound (at Amazon product search) :)
   
   CMS will already apply backpressure (stall indexing threads that keep 
writing new segments) when there is a backlog of merges.  It has 
`setMaxMergesAndThreads` for this, with two settings.  First setting 
(`maxThreadCount`) says how many merge threads can run at once, second setting 
(`maxMergeCount`) limits the max count of merges that need execution 
(`maxMergeCount >= maxThreadCount`).  As @jpountz describes, it's like a fixed 
sized queue (`maxMergeCount - maxThreadCount`), and any attempted merge beyond 
that will block ongoing indexing until merges catch up.
   
   "We" (Amazon product search) set `maxMergeCount` and `maxThreadCount` way 
too high (100 I think!), allowing basically unbounded merge backlog and no 
indexing backpressure.  So let this be a warning to Lucene users!  Don't 
blindly undo CMS's limits ...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] A multi-tenant ConcurrentMergeScheduler [lucene]

Reply via email to