zhaih commented on PR #13124:
URL: https://github.com/apache/lucene/pull/13124#issuecomment-1961788368

   So the current way of HNSW concurrent merge implemented is: each worker will 
try to use an AtomicInteger to coordinate and only do a small batch of work 
(1024 documents) each time. The advantage is we are able to load balance 
between workers and I remember this did brings some (5-10%) performance gain 
when I was testing it. 
   
   Maybe, instead of specify a numWorkers per merge, we can default a expected 
work load per thread (like 10K), and then allocate numWorkers dynamically? But 
still keep the current way of merge to keep the performance?
   
   One thing I'm worried about putting all things into CMS is that we're 
binding intra segment merge with CMS. But to my understanding using CMS means 
we're using background thread to merge and merge become indeterministic, such 
that there are still some part of users are using SMS (SerialMS) or similar 
thing to keep the deterministic of merging. But on the other hand the HNSW 
concurrent merge does not affect that aspect at all, no matter how many threads 
you're using it won't affect the determinism of merge result. So if we bind 
those two together whether we potentially prevent a part of users using the 
intra-segment merges?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to