[ https://issues.apache.org/jira/browse/LUCENE-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540284#comment-17540284 ]
Michael Sokolov commented on LUCENE-9625: ----------------------------------------- I realized maybe this deserves a better explanation: I didn't use multi-threading in the KnnGraphTester that builds the index since my goal at the time was really to evaluate whether our algorithm implementation is correct and how it performs on a single HNSW graph index. If we use multiple threads, this is going to lead to a more fragmented graph due to the way Lucene indexes segments, and while this would be a useful point of comparison, it also creates a different variable to tune in the benchmark evaluation. If you do want to pursue this, I would suggest configuring the IndexWriterConfig with large buffers so that each thread creates a single segment, and exposing the number of threads/segments as a tunable parameter since it is going to impact the recall and latency reported by the benchmark > Benchmark KNN search with ann-benchmarks > ---------------------------------------- > > Key: LUCENE-9625 > URL: https://issues.apache.org/jira/browse/LUCENE-9625 > Project: Lucene - Core > Issue Type: New Feature > Reporter: Michael Sokolov > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In addition to benchmarking with luceneutil, it would be good to be able to > make use of ann-benchmarks, which is publishing results from many approximate > knn algorithms, including the hnsw implementation from its authors. We don't > expect to challenge the performance of these native code libraries, however > it would be good to know just how far off we are. > I started looking into this and posted a fork of ann-benchmarks that uses > KnnGraphTester class to run these: > https://github.com/msokolov/ann-benchmarks. It's still a WIP; you have to > manually copy jars and the KnnGraphTester.class to the test host machine > rather than downloading from a distribution. KnnGraphTester needs some > modifications in order to support this process - this issue is mostly about > that. > One thing I noticed is that some of the index builds with higher fanout > (efConstruction) settings time out at 2h (on an AWS c5 instance), so this is > concerning and I'll open a separate issue for trying to improve that. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org