vigyasharma commented on code in PR #14963: URL: https://github.com/apache/lucene/pull/14963#discussion_r2217673778
########## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java: ########## @@ -632,8 +695,23 @@ public void addValue(int docID, T vectorValue) throws IOException { + "\" appears more than once in this document (only one value is allowed per field)"); } flatFieldVectorsWriter.addValue(docID, vectorValue); - scorer.setScoringOrdinal(node); - hnswGraphBuilder.addGraphNode(node, scorer); + // Check if we need to initialize graph builder for tiny segment optimization + if (bypassTinySegments + && graphBuilderInitialized == false + && node >= Lucene99HnswVectorsFormat.HNSW_GRAPH_THRESHOLD) { + initializeGraphBuilder(); + // Replay buffered vectors + replayBufferedVectors(); + bufferedVectors.clear(); + } + if (hnswGraphBuilder != null) { Review Comment: Does `hnswGraphBuilder != null` do the same thing as `graphBuilderInisialized` ? if so, do we need `graphBuilderInisialized` ? ########## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java: ########## @@ -598,10 +629,21 @@ static FieldWriter<?> create( FieldInfo fieldInfo, int M, int beamWidth, - InfoStream infoStream) + InfoStream infoStream, + boolean bypassTinySegments) throws IOException { this.fieldInfo = fieldInfo; - RandomVectorScorerSupplier scorerSupplier = + this.M = M; + this.beamWidth = beamWidth; + this.infoStream = infoStream; + this.bypassTinySegments = bypassTinySegments; + this.flatFieldVectorsWriter = Objects.requireNonNull(flatFieldVectorsWriter); + if (bypassTinySegments) { + this.bufferedVectors = new ArrayList<>(); Review Comment: Since we only store upto `HNSW_GRAPH_THRESHOLD` no. of vectors, beyond which we resume the regular flow of adding them to the graph, we could use an array here instead of an ArrayList? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org