kaivalnp commented on PR #14874: URL: https://github.com/apache/lucene/pull/14874#issuecomment-3033770135
> if the results hold up for 768d vectors Thanks @msokolov I quantized the Cohere 768d vectors by: 1. normalizing 2. scaling each dimension by 256 3. clipping between \[-128, 127\] i.e. basically replicating [this snippet](https://github.com/mikemccand/luceneutil/blob/522d3c377815686ea8a801d6735f68ae69d637d5/src/main/perf/VectorDictionary.java#L197-L198) But something _**really**_ strange is happening! I first ran a set of benchmarks using `-reindex` to create fresh indices and ensure that the indexing times are not adversely affected: `main` (run 1) ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s force_merge(s) num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.940 1.568 1.567 1.000 200000 100 50 64 250 no 63.52 3148.47 23.21 1 152.11 585.938 585.938 HNSW ``` This PR (run 2) ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s force_merge(s) num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.939 2.894 2.894 1.000 200000 100 50 64 250 no 37.69 5306.87 12.99 1 152.09 585.938 585.938 HNSW ``` This felt a bit strange because of a higher latency at query time (conflicting with JMH results) + lower indexing time in this PR -- so I ran a follow-up _without `-reindex`_ to search from a built index (basically re-using the index created in run 2 for runs 3 and 4): `main` (run 3) ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s force_merge(s) num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.939 1.602 1.601 0.999 200000 100 50 64 250 no 0.00 Infinity 0.12 1 152.09 585.938 585.938 HNSW ``` This PR (run 4) ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s force_merge(s) num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.939 1.090 1.090 1.000 200000 100 50 64 250 no 0.00 Infinity 0.12 1 152.09 585.938 585.938 HNSW ``` The performance in `main` is unchanged -- but we see significant improvement after moving the query off-heap! I don't have a good explanation for this, I'll try to replicate on 300d vectors to see if this holds true! Meanwhile, it'd be great if you could reproduce the benchmarks like I did :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org