zhaih commented on PR #15208: URL: https://github.com/apache/lucene/pull/15208#issuecomment-3555254980
So I did take several runs more and logged done more things and realized that **this approach is way less efficient then the previous naive approach when you have way more threads available than number of segment to merge** , as I just simply split the work to one thread per sub graph. For example, in the case where we're running the 1M docs, in the last force merge there are only two segment to merge, e.g. the logs I logged down ``` HNSW 1 [2025-11-20T00:40:21.189001876Z; Lucene Merge Thread #0]: build graph from 1000000 vectors, with 8 workers HNSW 1 [2025-11-20T00:40:21.189113741Z; Lucene Merge Thread #0]: Starting join set merge for graph 1 HNSW 1 [2025-11-20T00:40:21.189157504Z; hnsw-merge-1-thread-4]: Starting join set merge for graph 2 HNSW 1 [2025-11-20T00:40:21.818514767Z; Lucene Merge Thread #0]: Done join set computation for graph 1 HNSW 1 [2025-11-20T00:40:21.926615241Z; hnsw-merge-1-thread-4]: Done join set computation for graph 2 HNSW 1 [2025-11-20T00:40:40.564842623Z; hnsw-merge-1-thread-4]: Done adding join set nodes for graph 2 HNSW 1 [2025-11-20T00:40:42.363841695Z; Lucene Merge Thread #0]: Done adding join set nodes for graph 1 HNSW 1 [2025-11-20T00:41:53.260574252Z; hnsw-merge-1-thread-4]: Done adding rest of nodes for graph 2 HNSW 1 [2025-11-20T00:41:53.260704039Z; hnsw-merge-1-thread-4]: Done join set merge for graph 2 HNSW 1 [2025-11-20T00:42:01.258092293Z; Lucene Merge Thread #0]: Done adding rest of nodes for graph 1 HNSW 1 [2025-11-20T00:42:01.258182854Z; Lucene Merge Thread #0]: Done join set merge for graph 1 ``` And also notice one thread finish it's work 8 seconds earlier, so the granularity is way too coarse in this approach to make it efficient. Let me try to distribute the work a bit wiser and see how it goes.. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
