kaivalnp commented on code in PR #14963: URL: https://github.com/apache/lucene/pull/14963#discussion_r2379475856
########## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java: ########## @@ -550,17 +573,28 @@ static int distFuncToOrd(VectorSimilarityFunction func) { throw new IllegalArgumentException("invalid distance function: " + func); } + private static int graphCreationThreshold(int k, int numNodes) { + return (int) + Math.pow(10, String.valueOf(HnswGraphSearcher.expectedVisitedNodes(k, numNodes)).length()); + } + private static class FieldWriter<T> extends KnnFieldVectorsWriter<T> { private static final long SHALLOW_SIZE = RamUsageEstimator.shallowSizeOfInstance(FieldWriter.class); private final FieldInfo fieldInfo; - private final HnswGraphBuilder hnswGraphBuilder; + private HnswGraphBuilder hnswGraphBuilder; // only created when needed private int lastDocID = -1; private int node = 0; private final FlatFieldVectorsWriter<T> flatFieldVectorsWriter; private UpdateableRandomVectorScorer scorer; + private final int graphThreshold; + private final List<T> bufferedVectors; Review Comment: I wonder if we can avoid this duplicate list of buffered vectors? The local `FlatFieldVectorsWriter` already stores these vectors, and exposes them via [`getVectors()`](https://github.com/apache/lucene/blob/e706267b893576cd334a783e6dfa8b4008cdc7b2/lucene/core/src/java/org/apache/lucene/codecs/hnsw/FlatFieldVectorsWriter.java#L35) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org