[ 
https://issues.apache.org/jira/browse/LUCENE-9644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17264190#comment-17264190
 ] 

Michael McCandless commented on LUCENE-9644:
--------------------------------------------

This change seems to have caused a surprising jump in Lucene's nightly 
{{VectorSearch}} performance: 
[https://home.apache.org/~mikemccand/lucenebench/VectorSearch.html] (annotation 
DJ).

> HNSW diverse neighbor selection heuristic
> -----------------------------------------
>
>                 Key: LUCENE-9644
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9644
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael Sokolov
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> This will replace the simple nearest neighbor selection with a criterion that 
> takes into account the distance of the neighbors from each other. It is seen 
> to provide dramatically improved recall on at least two datasets, and is what 
> is being used by our reference implementation, hnswlib. The basic idea is 
> that when selecting  the nearest neighbors to associate with a new node added 
> to the graph, we filter using a diversity criterion. If a candidate neighbor 
> is closer to an already-added (closer to the new node) neighbor than it is to 
> the new node, then we pass over it, moving on to more-distant, but presumably 
> more diverse neighbors. The same criterion is also (re-) applied to the 
> neighbor nodes' neighbors, since we add the links bidirectionally.
> h2. Results:
> h3. GloVe/Wikipedia
> h4. baseline
> ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited||  index 
> ms||
> |0.643|        0.77|  100000| 50|     32|     64|     1742|   22830|
> |0.671|        0.95|  100000| 100|    32|     64|     2141|   0|
> |0.704|        1.32|  100000| 200|    32|     64|     2923|   0|
> |0.739|        2.04|  100000| 400|    32|     64|     4382|   0|
> |0.470|        0.91|  1000000|        50|     32|     64|     2068|   337081|
> |0.496|        1.21|  1000000|        100|    32|     64|     2548|   0|
> |0.533|        1.77|  1000000|        200|    32|     64|     3479|   0|
> |0.573|        2.58|  1000000|        400|    32|     64|     5257|   0|
> h4. diverse
> ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited||  index 
> ms||
> |0.801|        0.57|  100000| 50|     32|     64|     593|    17985|
> |0.840|        0.67|  100000| 100|    32|     64|     738|    0|
> |0.883|        0.97|  100000| 200|    32|     64|     1018|   0|
> |0.921|        1.36|  100000| 400|    32|     64|     1502|   0|
> |0.723|        0.71|  1000000|        50|     32|     64|     860|    298383|
> |0.761|        0.77|  1000000|        100|    32|     64|     1058|   0|
> |0.806|        1.06|  1000000|        200|    32|     64|     1442|   0|
> |0.854|        1.67|  1000000|        400|    32|     64|     2159|   0|
> h3. Dataset from work:
> h4. baseline
> ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited||  index 
> ms||
> |0.933|        1.41|  100000| 50|     32|     64|     1496|   35462|
> |0.948|        1.39|  100000| 100|    32|     64|     1872|   0|
> |0.961|        2.10|  100000| 200|    32|     64|     2591|   0|
> |0.972|        3.04|  100000| 400|    32|     64|     3939|   0|
> |0.827|        1.34|  1000000|        50|     32|     64|     1676|   535802|
> |0.854|        1.76|  1000000|        100|    32|     64|     2056|   0|
> |0.887|        2.47|  1000000|        200|    32|     64|     2761|   0|
> |0.907|        3.75|  1000000|        400|    32|     64|     4129|   0|
> h4. diverse
> ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited||  index 
> ms||
> |0.966|        1.18|  100000| 50|     32|     64|     1480|   37656|
> |0.977|        1.46|  100000| 100|    32|     64|     1832|   0|
> |0.988|        2.00|  100000| 200|    32|     64|     2472|   0|
> |0.995|        3.14|  100000| 400|    32|     64|     3629|   0|
> |0.944|        1.34|  1000000|        50|     32|     64|     1780|   526834|
> |0.959|        1.71|  1000000|        100|    32|     64|     2222|   0|
> |0.975|        2.30|  1000000|        200|    32|     64|     3041|   0|
> |0.986|        3.56|  1000000|        400|    32|     64|     4543|   0|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to