zhaih commented on PR #15208:
URL: https://github.com/apache/lucene/pull/15208#issuecomment-3555254980

   So I did take several runs more and logged done more things and realized 
that **this approach is way less efficient then the previous naive approach 
when you have way more threads available than number of segment to merge** , as 
I just simply split the work to one thread per sub graph. For example, in the 
case where we're running the 1M docs, in the last force merge there are only 
two segment to merge, e.g. the logs I logged down
   ```
   HNSW 1 [2025-11-20T00:40:21.189001876Z; Lucene Merge Thread #0]: build graph 
from 1000000 vectors, with 8 workers
   HNSW 1 [2025-11-20T00:40:21.189113741Z; Lucene Merge Thread #0]: Starting 
join set merge for graph 1
   HNSW 1 [2025-11-20T00:40:21.189157504Z; hnsw-merge-1-thread-4]: Starting 
join set merge for graph 2
   HNSW 1 [2025-11-20T00:40:21.818514767Z; Lucene Merge Thread #0]: Done join 
set computation for graph 1
   HNSW 1 [2025-11-20T00:40:21.926615241Z; hnsw-merge-1-thread-4]: Done join 
set computation for graph 2
   HNSW 1 [2025-11-20T00:40:40.564842623Z; hnsw-merge-1-thread-4]: Done adding 
join set nodes for graph 2
   HNSW 1 [2025-11-20T00:40:42.363841695Z; Lucene Merge Thread #0]: Done adding 
join set nodes for graph 1
   HNSW 1 [2025-11-20T00:41:53.260574252Z; hnsw-merge-1-thread-4]: Done adding 
rest of nodes for graph 2
   HNSW 1 [2025-11-20T00:41:53.260704039Z; hnsw-merge-1-thread-4]: Done join 
set merge for graph 2
   HNSW 1 [2025-11-20T00:42:01.258092293Z; Lucene Merge Thread #0]: Done adding 
rest of nodes for graph 1
   HNSW 1 [2025-11-20T00:42:01.258182854Z; Lucene Merge Thread #0]: Done join 
set merge for graph 1
   ```
   And also notice one thread finish it's work 8 seconds earlier, so the 
granularity is way too coarse in this approach to make it efficient. Let me try 
to distribute the work a bit wiser and see how it goes..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to