zhaih commented on PR #13124: URL: https://github.com/apache/lucene/pull/13124#issuecomment-1961788368
So the current way of HNSW concurrent merge implemented is: each worker will try to use an AtomicInteger to coordinate and only do a small batch of work (1024 documents) each time. The advantage is we are able to load balance between workers and I remember this did brings some (5-10%) performance gain when I was testing it. Maybe, instead of specify a numWorkers per merge, we can default a expected work load per thread (like 10K), and then allocate numWorkers dynamically? But still keep the current way of merge to keep the performance? One thing I'm worried about putting all things into CMS is that we're binding intra segment merge with CMS. But to my understanding using CMS means we're using background thread to merge and merge become indeterministic, such that there are still some part of users are using SMS (SerialMS) or similar thing to keep the deterministic of merging. But on the other hand the HNSW concurrent merge does not affect that aspect at all, no matter how many threads you're using it won't affect the determinism of merge result. So if we bind those two together whether we potentially prevent a part of users using the intra-segment merges? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
