jmazanec15 commented on issue #11354: URL: https://github.com/apache/lucene/issues/11354#issuecomment-1369963611
Hi @msokolov, > First, it looks to me as if we see some very since improvement for the larger graphs, preserve the same recall, and changes to QPS are probably noise. I guess the assumption is we are producing similar results with less work? Right, basically instead of adding the first 0-X ordinals to the graph, we manually insert the nodes and their neighbors from the initializer graph into the merge graph, avoiding the searching for neighbors step. I think QPS is mostly noise. Recall is roughly the same - not always exactly because in the PR the random number generation gets a bit mixed up. > Just so we can understand these results a little better, could you document how you arrived at them? What dataset did you use? How did you measure the times and recall (was it using KnnGraphTester? luceneutil? some other benchmarking tool?). Sure, I used the same procedure for the latest results as outlined here: https://github.com/apache/lucene/issues/11354#issuecomment-1239961308. I used the sift 1M 128 dimensional L2 data set. This was using KnnGraphTester, controlling the number of initial segments and then forcemerging to 1 segment. > I'd also be curious to see the numbers and sizes of the segments in the results: I assume they would be unchanged from Control to Test, but it would be nice to be able to verify. I would assume so too. Let me get these numbers as well - will post soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org