msokolov commented on code in PR #14963:
URL: https://github.com/apache/lucene/pull/14963#discussion_r2248750458


##########
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsFormat.java:
##########
@@ -137,9 +144,16 @@ public final class Lucene99HnswVectorsFormat extends 
KnnVectorsFormat {
   private final int numMergeWorkers;
   private final TaskExecutor mergeExec;
 
+  /**
+   * Whether to bypass HNSW graph building for tiny segments (below {@link 
#HNSW_GRAPH_THRESHOLD}).
+   * When enabled, segments with fewer than the threshold number of vectors 
will store only flat
+   * vectors, significantly improving indexing performance for workloads with 
frequent flushes.
+   */
+  private final boolean bypassTinySegments;

Review Comment:
   I do wonder if we would want to expose as a parameter though?  Maybe it 
should just be a fixed value?  I would have thought about setting it based on a 
threshold where exhaustive search is no-or-only-slightly more expensive than 
hnsw search?  I would expect this to be related to the M of the graph maybe?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to