shubhamvishu commented on PR #14963:
URL: https://github.com/apache/lucene/pull/14963#issuecomment-3353824447
> could you confirm the benchmark results?
@benwtrent Sure, I'll share the results with the latest changes soon.
> When I tried benchmarking, I didn't get anywhere near the numbers you got.
Would be good to know the exact settings, etc. tested so we can replicate and
see where the benefits are for this.
Below are the settings I tried for the above results if I'm not mistaken.
Also as we discussed earlier, I tried with a little higher threshold(mentioned
below) so that is also one thing I would like to confirm the impact on numbers
as we have changed it now. Let me know if there is anything else I might be
missing. Thanks!
Dataset : Cohere
```
dim = 768
doc_vectors =
f"{constants.BASE_DIR}/data/cohere-wikipedia-docs-5M-{dim}d.vec"
query_vectors =
f"{constants.BASE_DIR}/data/cohere-wikipedia-queries-{dim}d.vec"
```
Graph Threshold
```
private static int graphCreationThreshold(int k, int numNodes) {
return (int)
Math.pow(10, String.valueOf(HnswGraphSearcher.expectedVisitedNodes(k,
numNodes)).length());
}
```
```
PARAMS = {
"ndoc": (500_000,),
"maxConn": (64,),
"beamWidthIndex": (250,),
"fanout": (50,),
"numMergeWorker": (12,),
"numMergeThread": (4,),
"numSearchThread": (0,),
"encoding": ("float32",),
"quantizeBits": (
4,
7,
32,
),
"topK": (100,),
"quantizeCompress": (True,),
"queryStartIndex": (0,),
"forceMerge": (False,),
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]