benwtrent commented on PR #13124:
URL: https://github.com/apache/lucene/pull/13124#issuecomment-1992172143

   @jpountz 
   
   OK, so I did some benchmarking to look at the impact of this change. 500k 
docs, flushing segments every 1MB and then force merging. 
   
   For all of these, the number of workers is `8` and the threads available are 
`8`.
   
   Baseline single threaded (no concurrency in HNSW)
   ```
   Indexed: 467617ms
   Force merge: 593037 ms
   ```
   
   Baseline multi-threaded (executor separate from CMS)
   ```
   Indexed: 173143ms
   Force merge: 120341 ms
   ```
   
   Candidate, CMS with 8 merge threads, sharing with intra-merge (what this PR 
is doing). Shows that we share the threads with the CMS background threads. 
   
   There is a ton of merging work being done, so usually just 1 thread is being 
passed to the intra-merge executor. This is an expected result and working as 
intended.
   However, we see a speed up in force-merge as no other merge activity is 
occurring and it can use all the provided CMS threads.
   ```
   Indexed: 424924ms
   Force merge: 121705 ms
   ```
   
   To confirm that this is indeed the case and that indexing wasn’t slowed down 
due to some other weird overhead, I gave 2x as many threads to intra-merging. 
This effectively removes the limit that CMS is providing to intra-merges. 
   Shows that that the indexing is now inline with how it is now, basically 
using 2x as many threads as configured for CMS (8 merge threads and 8 
intra-merge threads).
   ```
   Indexed: 171825ms
   Force merge: 121886 ms
   ```
   
   To make sure we don’t slow down too much compared to baseline single 
threaded, I gave 8 workers to HNSW merge, but only provided a 
SameThreadExecutorService.
   
   I did this to justify the default to a SameThreadExecutorService, even when 
HNSW has > 1 worker.
   
   Well within the runtime of baseline.
   ```
   Indexed: 418529ms
   Force merge: 524468 ms
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to