benwtrent commented on PR #13124: URL: https://github.com/apache/lucene/pull/13124#issuecomment-1992172143
@jpountz OK, so I did some benchmarking to look at the impact of this change. 500k docs, flushing segments every 1MB and then force merging. For all of these, the number of workers is `8` and the threads available are `8`. Baseline single threaded (no concurrency in HNSW) ``` Indexed: 467617ms Force merge: 593037 ms ``` Baseline multi-threaded (executor separate from CMS) ``` Indexed: 173143ms Force merge: 120341 ms ``` Candidate, CMS with 8 merge threads, sharing with intra-merge (what this PR is doing). Shows that we share the threads with the CMS background threads. There is a ton of merging work being done, so usually just 1 thread is being passed to the intra-merge executor. This is an expected result and working as intended. However, we see a speed up in force-merge as no other merge activity is occurring and it can use all the provided CMS threads. ``` Indexed: 424924ms Force merge: 121705 ms ``` To confirm that this is indeed the case and that indexing wasn’t slowed down due to some other weird overhead, I gave 2x as many threads to intra-merging. This effectively removes the limit that CMS is providing to intra-merges. Shows that that the indexing is now inline with how it is now, basically using 2x as many threads as configured for CMS (8 merge threads and 8 intra-merge threads). ``` Indexed: 171825ms Force merge: 121886 ms ``` To make sure we don’t slow down too much compared to baseline single threaded, I gave 8 workers to HNSW merge, but only provided a SameThreadExecutorService. I did this to justify the default to a SameThreadExecutorService, even when HNSW has > 1 worker. Well within the runtime of baseline. ``` Indexed: 418529ms Force merge: 524468 ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org