[
https://issues.apache.org/jira/browse/LUCENE-10527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julie Tibshirani updated LUCENE-10527:
--------------------------------------
Description:
Recently I was rereading the HNSW paper
([https://arxiv.org/pdf/1603.09320.pdf)] and noticed that they suggest using a
different maxConn for the upper layers vs. the bottom one (which contains the
full neighborhood graph). Specifically, they suggest using maxConn=M for upper
layers and maxConn=2*M for the bottom. This differs from what we do, which is
to use maxConn=M for all layers.
I tried updating our logic using a hacky patch, and noticed an improvement in
latency for higher recall values (which is consistent with the paper's
observation):
*Results on glove-100-angular*
Parameters: M=32, efConstruction=100
!image-2022-04-20-14-53-58-484.png|width=400,height=367!
As we'd expect, indexing becomes a bit slower:
{code:java}
Baseline: Indexed 1183514 documents in 733s
Candidate: Indexed 1183514 documents in 948s{code}
When we benchmarked Lucene HNSW against hnswlib in LUCENE-9937, we noticed a
big difference in recall for the same settings of M and efConstruction. (Even
adding graph layers in LUCENE-10054 didn't really affect recall.) With this
change, the recall is now very similar:
*Results on glove-100-angular*
Parameters: M=32, efConstruction=100
{code:java}
num_cands Approach Recall
QPS
10 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.563
4410.499
50 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.798
1956.280
100 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.862
1209.734
100 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.958
341.428
800 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.974
230.396
1000 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.980
188.757
10 hnswlib ({'M': 32, 'efConstruction': 100}) 0.552
16745.433
50 hnswlib ({'M': 32, 'efConstruction': 100}) 0.794
5738.468
100 hnswlib ({'M': 32, 'efConstruction': 100}) 0.860
3336.386
500 hnswlib ({'M': 32, 'efConstruction': 100}) 0.956
832.982
800 hnswlib ({'M': 32, 'efConstruction': 100}) 0.973
541.097
1000 hnswlib ({'M': 32, 'efConstruction': 100}) 0.979
442.163
{code}
I think it'd be nice update to maxConn so that we faithfully implement the
paper's algorithm. This is probably least surprising for users, and I don't see
a strong reason to take a different approach from the paper? Let me know what
you think!
was:
Recently I was rereading the HNSW paper
([https://arxiv.org/pdf/1603.09320.pdf)] and noticed that they suggest using a
different maxConn for the upper layers vs. the bottom one (which contains the
full neighborhood graph). Specifically, they suggest using maxConn=M for upper
layers and maxConn=2*M for the bottom. This differs from what we do, which is
to use maxConn=M for all layers.
I tried updating our logic using a hacky patch, and noticed an improvement in
latency for higher recall values (which is consistent with the paper's
observation):
*Results on glove-100-angular*
Parameters: M=32, efConstruction=100
!image-2022-04-20-14-53-58-484.png!
As we'd expect, indexing becomes a bit slower:
{code:java}
Baseline: Indexed 1183514 documents in 733s
Candidate: Indexed 1183514 documents in 948s{code}
When we benchmarked Lucene HNSW against hnswlib in LUCENE-9937, we noticed a
big difference in recall for the same settings of M and efConstruction. (Even
adding graph layers in LUCENE-10054 didn't really affect recall.) With this
change, the recall is now very similar:
*Results on glove-100-angular*
Parameters: M=32, efConstruction=100
{code:java}
num_cands Approach Recall
QPS
10 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.563
4410.499
50 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.798
1956.280
100 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.862
1209.734
100 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.958
341.428
800 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.974
230.396
1000 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.980
188.757
10 hnswlib ({'M': 32, 'efConstruction': 100}) 0.552
16745.433
50 hnswlib ({'M': 32, 'efConstruction': 100}) 0.794
5738.468
100 hnswlib ({'M': 32, 'efConstruction': 100}) 0.860
3336.386
500 hnswlib ({'M': 32, 'efConstruction': 100}) 0.956
832.982
800 hnswlib ({'M': 32, 'efConstruction': 100}) 0.973
541.097
1000 hnswlib ({'M': 32, 'efConstruction': 100}) 0.979
442.163
{code}
I think it'd be nice update to maxConn so that we faithfully implement the
paper's algorithm. This is probably least surprising for users, and I don't see
a strong reason to take a different approach from the paper? Let me know what
you think!
> Use bigger maxConn for last layer in HNSW
> -----------------------------------------
>
> Key: LUCENE-10527
> URL: https://issues.apache.org/jira/browse/LUCENE-10527
> Project: Lucene - Core
> Issue Type: Task
> Reporter: Julie Tibshirani
> Priority: Minor
> Attachments: hnsw_plot.png, image-2022-04-20-14-53-58-484.png
>
>
> Recently I was rereading the HNSW paper
> ([https://arxiv.org/pdf/1603.09320.pdf)] and noticed that they suggest using
> a different maxConn for the upper layers vs. the bottom one (which contains
> the full neighborhood graph). Specifically, they suggest using maxConn=M for
> upper layers and maxConn=2*M for the bottom. This differs from what we do,
> which is to use maxConn=M for all layers.
> I tried updating our logic using a hacky patch, and noticed an improvement in
> latency for higher recall values (which is consistent with the paper's
> observation):
> *Results on glove-100-angular*
> Parameters: M=32, efConstruction=100
> !image-2022-04-20-14-53-58-484.png|width=400,height=367!
> As we'd expect, indexing becomes a bit slower:
> {code:java}
> Baseline: Indexed 1183514 documents in 733s
> Candidate: Indexed 1183514 documents in 948s{code}
> When we benchmarked Lucene HNSW against hnswlib in LUCENE-9937, we noticed a
> big difference in recall for the same settings of M and efConstruction. (Even
> adding graph layers in LUCENE-10054 didn't really affect recall.) With this
> change, the recall is now very similar:
> *Results on glove-100-angular*
> Parameters: M=32, efConstruction=100
> {code:java}
> num_cands Approach Recall
> QPS
> 10 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.563
> 4410.499
> 50 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.798
> 1956.280
> 100 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.862
> 1209.734
> 100 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.958
> 341.428
> 800 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.974
> 230.396
> 1000 luceneknn dim=100 {'M': 32, 'efConstruction': 100} 0.980
> 188.757
> 10 hnswlib ({'M': 32, 'efConstruction': 100}) 0.552
> 16745.433
> 50 hnswlib ({'M': 32, 'efConstruction': 100}) 0.794
> 5738.468
> 100 hnswlib ({'M': 32, 'efConstruction': 100}) 0.860
> 3336.386
> 500 hnswlib ({'M': 32, 'efConstruction': 100}) 0.956
> 832.982
> 800 hnswlib ({'M': 32, 'efConstruction': 100}) 0.973
> 541.097
> 1000 hnswlib ({'M': 32, 'efConstruction': 100}) 0.979
> 442.163
> {code}
> I think it'd be nice update to maxConn so that we faithfully implement the
> paper's algorithm. This is probably least surprising for users, and I don't
> see a strong reason to take a different approach from the paper? Let me know
> what you think!
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]