benwtrent commented on PR #14527: URL: https://github.com/apache/lucene/pull/14527#issuecomment-2847706746
@weizijun It should be fixed. Having an estimation that is more than 2x off is pretty bad. this estimation is used to determine how often flushes should occur, etc. There are a couple of ways this can be fixed. A simple way could be providing an optional call-back to `NeighborArray` that accesses package-private method on `OnHeapHnswGraph` that allows for their individual estimation to be adjusted during array growth. `NeighborArray(OnHeapHnswGraph::updateEstimate...)` or something. Then the ram estimation in OnHeapHnswGraph becomes the accumulation of those estimates as the inner estimates evolve. We need to be cautious there with multi-threadedness as many node updates could be occuring at a time. So, likely this inner accumulator needs to be `LongAccumulator` and it should also assert that its always a positive number Please also adjust the inner arrays to enforce their maximal length. This way we never over-allocate. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org