benwtrent opened a new issue, #14208: URL: https://github.com/apache/lucene/issues/14208
### Description I am not sure of other structures, but HNSW merges can allocate a pretty large chunk of memory on heap. For example: Let's have the max_conn set to 16. Thus connections on the bottom layer is 32. We eagerly create the neighbor arrays, which means for 9 million vectors, the heap allocation balloons to over 2GB (and depending on the number of layers and other structures, is over 2.5GB of heap). From what I can tell, merges don't really expose a "Here is how much heap I am estimated to use". I wonder if we can do one of the following to help this scenario: - Make HNSW merges cheaper when it comes to on-heap memory (e.g. merge off heap?!? make it cheaper??) - Don't eagerly allocate all the memory required (complicates multi-threaded merging...and might not actually address the issue) Note, this is tangential this other HNSW merging issue, and might actually be an antithesis, as sometimes reducing memory allocations then implies slower merging: https://github.com/apache/lucene/issues/12440 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org