zhaih commented on code in PR #12480: URL: https://github.com/apache/lucene/pull/12480#discussion_r1280985676
########## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java: ########## @@ -31,20 +33,21 @@ public class NeighborArray { private final boolean scoresDescOrder; private int size; - float[] score; int[] node; private int sortedNodeSize; + private HashMap<Integer, ScoringFunction> scoringContext; public NeighborArray(int maxSize, boolean descOrder) { node = new int[maxSize]; score = new float[maxSize]; this.scoresDescOrder = descOrder; + scoringContext = new HashMap<>(); Review Comment: we don't need hashmap actually, I believe an array with length of `maxSize` is more than enough, as we're at most mapping `idx` to `ScoringFunction` right? ########## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java: ########## @@ -111,6 +129,12 @@ public int[] sort() { private int insertSortedInternal() { Review Comment: To further optimize the memory usage and eliminate potential GC overhead, I would suggest not storing `ScoringFunction` at all. In theory, to compute a score, we need 3 things: SimilarityFunction, node_emb_1, node_emb_2. Where node embeddings can be get by calling `vectors.vectorValue(nodeId)` from `HnswGraphBuilder`, and also `SimilarityFunction` is held by `HnswGraphBuilder`. That means, we don't need to even store the similarityFunction, and two bytes array beforehand, we can let the `HnswGraphBuilder` pass in a `BiFunction<Integer, Integer, Float>` to perform scoring execution when we need it and inside `NeighborArray` we just need to give two nodes' id to that function. That way, we don't need to store extra context inside `NeighborArray` and also avoid holding a lot of `byte[]` for too long. (If you think about it, we're holding all byte/float arrays necessary for score computation until we computed it or the graph is constructed, that's a huge GC load if your computer doesn't have enough memory) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org