shubhamvishu commented on PR #14963: URL: https://github.com/apache/lucene/pull/14963#issuecomment-3242383393
To verify that this isn't a red herring, I deliberately increased the `HNSW_GRAPH_THRESHOLD` to `1000000`, effectively preventing the creation of any HNSW graphs (as confirmed by the very low indexing time and the absence of graph layers in the structure). As expected, latency increased significantly due to a fallback to exact search. This validates that the earlier results with `HNSW_GRAPH_THRESHOLD` set to `10` or `100` represent a win-win situation i.e. we achieve ~4x faster indexing without compromising on latency (in fact, latency improves due to fewer segments). #### Candidate (with `HNSW_GRAPH_THRESHOLD` = 1000000) ``` : . . : Leaf 0 has 0 layers Leaf 0 has 309324 documents Leaf 1 has 0 layers Leaf 1 has 154373 documents Leaf 2 has 0 layers Leaf 2 has 36303 documents Results: recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.514 89.174 89.153 1.000 500000 100 50 64 250 4 bits 26.93 18565.96 3 1651.81 1649.857 185.013 HNSW 0.890 223.908 223.798 1.000 500000 100 50 64 250 7 bits 26.93 18569.41 3 1834.91 1832.962 368.118 HNSW 1.000 140.186 140.176 1.000 500000 100 50 64 250 no 24.19 20667.99 3 1466.79 1464.844 1464.844 HNSW ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org