[I] How to control the total number of merging threads when vector data merging easily leads to memory overflow and high CPU cost [lucene]

via GitHub Thu, 24 Apr 2025 20:44:38 -0700


weizijun opened a new issue, #14554:
URL: https://github.com/apache/lucene/issues/14554


   ### Description
   
   When there are many shards to merge, vector data merging can easily lead to 
memory overflow and high CPU cost.
   The index.merge.scheduler.max_thread_count parameter can't control the merge 
thread count, it only pause the writeByte by MergeRateLimiter when the merge 
thread is bigger then max_thread_count. 
   But OnHeapHnswGraph has been built during the pause phase, and it will take 
up so much memory that the Java heap is not enough.
   This problem can easily be caused when a datanode with a 32G heap size holds 
2-3TB of vector documents(with bbq, the node can contain these data).
   The PR https://github.com/apache/lucene/pull/14527 can reduce the heap size, 
but it don't solve the problem totally.
   Is there any solution to this problem?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[I] How to control the total number of merging threads when vector data merging easily leads to memory overflow and high CPU cost [lucene]

Reply via email to