msokolov commented on code in PR #11743: URL: https://github.com/apache/lucene/pull/11743#discussion_r961747487
########## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java: ########## @@ -104,8 +104,8 @@ public void removeLast() { } public void removeIndex(int idx) { - System.arraycopy(node, idx + 1, node, idx, size - idx); - System.arraycopy(score, idx + 1, score, idx, size - idx); + System.arraycopy(node, idx + 1, node, idx, size - idx - 1); Review Comment: oh, good! Is it OK when `size - idx - 1` == 0? I think so, at least I checked javadocs and they say only length < 0 raises an error ########## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ########## @@ -175,20 +175,28 @@ public long ramBytesUsed() { long neighborArrayBytes0 = nsize0 * (Integer.BYTES + Float.BYTES) + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER * 2 - + RamUsageEstimator.NUM_BYTES_OBJECT_REF; + + RamUsageEstimator.NUM_BYTES_OBJECT_REF + + Integer.BYTES * 2; long neighborArrayBytes = nsize * (Integer.BYTES + Float.BYTES) + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER * 2 - + RamUsageEstimator.NUM_BYTES_OBJECT_REF; - + + RamUsageEstimator.NUM_BYTES_OBJECT_REF + + Integer.BYTES * 2; long total = 0; for (int l = 0; l < numLevels; l++) { int numNodesOnLevel = graph.get(l).size(); Review Comment: were we overestimating before because not all the nodes were populated? ########## lucene/core/src/test/org/apache/lucene/util/TestRamUsageEstimator.java: ########## @@ -222,6 +229,33 @@ public void testPrintValues() { System.out.println("LONG_SIZE = " + LONG_SIZE); } + public void testHnswGraph() throws IOException { + int size = atLeast(2000); + int dim = randomIntBetween(100, 1024); + int M = randomIntBetween(4, 96); + VectorSimilarityFunction similarityFunction = + VectorSimilarityFunction.values()[ + random().nextInt(VectorSimilarityFunction.values().length - 1) + 1]; + VectorEncoding vectorEncoding; + if (similarityFunction == VectorSimilarityFunction.DOT_PRODUCT) { Review Comment: I don't think you need this check -- it's OK now to use BYTE encoding with other similarities -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org