jtibshirani edited a comment on issue #1314: LUCENE-9136: Coarse quantization that reuses existing formats. URL: https://github.com/apache/lucene-solr/pull/1314#issuecomment-594242054 **Benchmarks** sift-128-euclidean: a dataset of 1 million SIFT descriptors with 128 dims. ``` APPROACH RECALL QPS LuceneExact() 1.000 6.425 LuceneCluster(n_probes=5) 0.749 574.186 LuceneCluster(n_probes=10) 0.874 308.455 LuceneCluster(n_probes=20) 0.951 116.871 LuceneCluster(n_probes=50) 0.993 67.354 LuceneCluster(n_probes=100) 0.999 34.651 ``` glove-100-angular: a dataset of ~1.2 million GloVe word vectors of 100 dims. ``` APPROACH RECALL QPS LuceneExact() 1.000 6.722 LuceneCluster(n_probes=5) 0.680 618.438 LuceneCluster(n_probes=10) 0.766 335.956 LuceneCluster(n_probes=20) 0.835 173.782 LuceneCluster(n_probes=50) 0.905 72.747 LuceneCluster(n_probes=100) 0.948 37.339 ``` These benchmarks were performed using the [ann-benchmarks repo](https://github.com/erikbern/ann-benchmarks). I hooked up the prototype to the benchmarking framework using py4j (e10d34c73dc391e4a105253f6181dfc0e9cb6705). Unfortunately py4j adds quite a bit of overhead (~3ms per search), so I had to measure that overhead and subtract it from the results. This is really not ideal, I will work on more robust benchmarks.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org