jpountz commented on code in PR #14963: URL: https://github.com/apache/lucene/pull/14963#discussion_r2248862786
########## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsFormat.java: ########## @@ -115,6 +115,13 @@ public final class Lucene99HnswVectorsFormat extends KnnVectorsFormat { /** Default to use single thread merge */ public static final int DEFAULT_NUM_MERGE_WORKER = 1; + /** + * Threshold below which HNSW graph building is bypassed for tiny segments. Segments with fewer + * vectors will use flat storage only, improving indexing performance when having frequent + * flushes. + */ + public static final int HNSW_GRAPH_THRESHOLD = 10_000; Review Comment: I think that the comment should try to expand a bit more on this value to help future readers think through whether it's still right or whether it should be updated. One thing we discussed on the linked issue is that the number of visited nodes is in the order of `log(size) * k`. So having a graph only helps if `log(size) * k << size` <=> `size / log(size) >> k`. If we arbitrarily choose k = 100, 10,000 is the first power of 10 so that `size / log(size)` is one order of magnitude greater than k (10/log(10) ~= 4.3, 100/log(100) ~= 22, 1000/log(1000) ~= 144, 10000 / log(10000) ~= 1085). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org