benwtrent commented on code in PR #14963: URL: https://github.com/apache/lucene/pull/14963#discussion_r2213897171
########## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsFormat.java: ########## @@ -137,9 +144,16 @@ public final class Lucene99HnswVectorsFormat extends KnnVectorsFormat { private final int numMergeWorkers; private final TaskExecutor mergeExec; + /** + * Whether to bypass HNSW graph building for tiny segments (below {@link #HNSW_GRAPH_THRESHOLD}). + * When enabled, segments with fewer than the threshold number of vectors will store only flat + * vectors, significantly improving indexing performance for workloads with frequent flushes. + */ + private final boolean bypassTinySegments; Review Comment: If we allow this to be a parameter, it should be a threshold that refers to the typical `k` used when querying. ########## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ########## @@ -326,16 +335,19 @@ private void search( final KnnCollector collector = new OrdinalTranslatedKnnCollector(knnCollector, scorer::ordToDoc); final Bits acceptedOrds = scorer.getAcceptOrds(acceptDocs); - HnswGraph graph = getGraph(fieldEntry); - boolean doHnsw = knnCollector.k() < scorer.maxOrd(); + boolean doHnsw = + knnCollector.k() < scorer.maxOrd() + && (bypassTinySegments == false + || fieldEntry.size() > Lucene99HnswVectorsFormat.HNSW_GRAPH_THRESHOLD); Review Comment: The reader should just look to see if there is a graph. ########## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java: ########## @@ -632,8 +695,23 @@ public void addValue(int docID, T vectorValue) throws IOException { + "\" appears more than once in this document (only one value is allowed per field)"); } flatFieldVectorsWriter.addValue(docID, vectorValue); - scorer.setScoringOrdinal(node); - hnswGraphBuilder.addGraphNode(node, scorer); + // Check if we need to initialize graph builder for tiny segment optimization + if (bypassTinySegments + && graphBuilderInitialized == false + && node >= Lucene99HnswVectorsFormat.HNSW_GRAPH_THRESHOLD) { Review Comment: I am thinking we should just use the `expectedNodeVisited` logic for the given node count (treating like its a "graph") vs. the tiny segment threshold number (which should be used as a `k`). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org