Re: [PR] Bypass HNSW graph building for tiny segments [lucene]

via GitHub Wed, 03 Sep 2025 00:58:07 -0700


shubhamvishu commented on PR #14963:
URL: https://github.com/apache/lucene/pull/14963#issuecomment-3242383393


   To verify that this isn't a red herring, I deliberately increased the 
`HNSW_GRAPH_THRESHOLD` to `1000000`, effectively preventing the creation of any 
HNSW graphs (as confirmed by the very low indexing time and the absence of 
graph layers in the structure). As expected, latency increased significantly 
due to a fallback to exact search. This validates that the earlier results with 
`HNSW_GRAPH_THRESHOLD` set to `10` or `100` represent a win-win situation i.e. 
we achieve ~4x faster indexing without compromising on latency (in fact, 
latency improves due to fewer segments).
   
   #### Candidate (with `HNSW_GRAPH_THRESHOLD` = 1000000)
   ```
   :
   .
   .
   :
   Leaf 0 has 0 layers
   Leaf 0 has 309324 documents
   Leaf 1 has 0 layers
   Leaf 1 has 154373 documents
   Leaf 2 has 0 layers
   Leaf 2 has 36303 documents
   
   Results:
   recall  latency(ms)   netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  num_segments  index_size(MB)  
vec_disk(MB)  vec_RAM(MB)  indexType
    0.514       89.174   89.153        1.000  500000   100      50       64     
   250     4 bits     26.93      18565.96             3         1651.81      
1649.857      185.013       HNSW
    0.890      223.908  223.798        1.000  500000   100      50       64     
   250     7 bits     26.93      18569.41             3         1834.91      
1832.962      368.118       HNSW
    1.000      140.186  140.176        1.000  500000   100      50       64     
   250         no     24.19      20667.99             3         1466.79      
1464.844     1464.844       HNSW
    ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Bypass HNSW graph building for tiny segments [lucene]

Reply via email to