zhaih commented on code in PR #12844:
URL: https://github.com/apache/lucene/pull/12844#discussion_r1408142129


##########
lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java:
##########
@@ -31,18 +33,21 @@
  *
  * @lucene.internal
  */
-public class NeighborArray {
+public class NeighborArray implements Accountable {
+  private static final int INITIAL_CAPACITY = 10;

Review Comment:
   It's ok to grow because it's under lock as @msokolov pointed out. However I 
think we previously avoid growing because we don't want to spend extra time on 
resizing and copy-paste?
   Could you run the knn benchmark to check the performance is ok? (Maybe 
better the multithread merge version to check btw the resizing is safe



##########
lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java:
##########
@@ -275,21 +275,15 @@ private void generateLevelToNodes() {
 
   @Override
   public long ramBytesUsed() {
-    long neighborArrayBytes0 =
-        (long) nsize0 * (Integer.BYTES + Float.BYTES)
-            + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER * 2L
-            + RamUsageEstimator.NUM_BYTES_OBJECT_REF * 2L
-            + Integer.BYTES * 3;
-    long neighborArrayBytes =
-        (long) nsize * (Integer.BYTES + Float.BYTES)
-            + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER * 2L
-            + RamUsageEstimator.NUM_BYTES_OBJECT_REF * 2L
-            + Integer.BYTES * 3;
     long total = 0;
-    total +=
-        size() * (neighborArrayBytes0 + 
RamUsageEstimator.NUM_BYTES_ARRAY_HEADER)
-            + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER; // for graph and level 
0;
-    total += nonZeroLevelSize.get() * neighborArrayBytes; // for non-zero level
+    for (NeighborArray[] neighborArraysPerNode : graph) {
+      if (neighborArraysPerNode != null) {
+        for (NeighborArray neighborArrayPerNodeAndLevel : 
neighborArraysPerNode) {
+          total += neighborArrayPerNodeAndLevel.ramBytesUsed();

Review Comment:
   Hmmm this can make this method potentially a bit slow, it might be ok since 
normally nobody will call it in any critical code path I guess, but it would 
still be better if we can have an estimation that runs faster rather than the 
every accurate account?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to