[ 
https://issues.apache.org/jira/browse/LUCENE-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540284#comment-17540284
 ] 

Michael Sokolov commented on LUCENE-9625:
-----------------------------------------

I realized maybe this deserves a better explanation: I didn't use 
multi-threading in the KnnGraphTester that builds the index since my goal at 
the time was really to evaluate whether our algorithm implementation is correct 
and how it performs on a single HNSW graph index. If we use multiple threads, 
this is going to lead to a more fragmented graph due to the way Lucene indexes 
segments, and while this would be a useful point of comparison, it also creates 
a different variable to tune in the benchmark evaluation. If you do want to 
pursue this, I would suggest configuring the IndexWriterConfig with large 
buffers so that each thread creates a single segment, and exposing the number 
of threads/segments as a tunable parameter since it is going to impact the 
recall and latency reported by the benchmark

> Benchmark KNN search with ann-benchmarks
> ----------------------------------------
>
>                 Key: LUCENE-9625
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9625
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Michael Sokolov
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> In addition to benchmarking with luceneutil, it would be good to be able to 
> make use of ann-benchmarks, which is publishing results from many approximate 
> knn algorithms, including the hnsw implementation from its authors. We don't 
> expect to challenge the performance of these native code libraries, however 
> it would be good to know just how far off we are.
> I started looking into this and posted a fork of ann-benchmarks that uses 
> KnnGraphTester  class to run these: 
> https://github.com/msokolov/ann-benchmarks. It's still a WIP; you have to 
> manually copy jars and the KnnGraphTester.class to the test host machine 
> rather than downloading from a distribution. KnnGraphTester needs some 
> modifications in order to support this process - this issue is mostly about 
> that.
> One thing I noticed is that some of the index builds with higher fanout 
> (efConstruction) settings time out at 2h (on an AWS c5 instance), so this is 
> concerning and I'll open a separate issue for trying to improve that.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to