benwtrent commented on code in PR #14963:
URL: https://github.com/apache/lucene/pull/14963#discussion_r2213897171


##########
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsFormat.java:
##########
@@ -137,9 +144,16 @@ public final class Lucene99HnswVectorsFormat extends 
KnnVectorsFormat {
   private final int numMergeWorkers;
   private final TaskExecutor mergeExec;
 
+  /**
+   * Whether to bypass HNSW graph building for tiny segments (below {@link 
#HNSW_GRAPH_THRESHOLD}).
+   * When enabled, segments with fewer than the threshold number of vectors 
will store only flat
+   * vectors, significantly improving indexing performance for workloads with 
frequent flushes.
+   */
+  private final boolean bypassTinySegments;

Review Comment:
   If we allow this to be a parameter, it should be a threshold that refers to 
the typical `k` used when querying.



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java:
##########
@@ -326,16 +335,19 @@ private void search(
     final KnnCollector collector =
         new OrdinalTranslatedKnnCollector(knnCollector, scorer::ordToDoc);
     final Bits acceptedOrds = scorer.getAcceptOrds(acceptDocs);
-    HnswGraph graph = getGraph(fieldEntry);
-    boolean doHnsw = knnCollector.k() < scorer.maxOrd();
+    boolean doHnsw =
+        knnCollector.k() < scorer.maxOrd()
+            && (bypassTinySegments == false
+                || fieldEntry.size() > 
Lucene99HnswVectorsFormat.HNSW_GRAPH_THRESHOLD);

Review Comment:
   The reader should just look to see if there is a graph.



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java:
##########
@@ -632,8 +695,23 @@ public void addValue(int docID, T vectorValue) throws 
IOException {
                 + "\" appears more than once in this document (only one value 
is allowed per field)");
       }
       flatFieldVectorsWriter.addValue(docID, vectorValue);
-      scorer.setScoringOrdinal(node);
-      hnswGraphBuilder.addGraphNode(node, scorer);
+      // Check if we need to initialize graph builder for tiny segment 
optimization
+      if (bypassTinySegments
+          && graphBuilderInitialized == false
+          && node >= Lucene99HnswVectorsFormat.HNSW_GRAPH_THRESHOLD) {

Review Comment:
   I am thinking we should just use the `expectedNodeVisited` logic for the 
given node count (treating like its a "graph") vs. the tiny segment threshold 
number (which should be used as a `k`).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to