kaivalnp commented on PR #14863: URL: https://github.com/apache/lucene/pull/14863#issuecomment-3016573861
I ran some benchmarks on Cohere vectors (768d) for 7-bit and 4-bit (compressed) quantization.. `main` without `jdk.incubator.vector`: ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s force_merge(s) num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.860 2.815 2.806 0.997 100000 100 50 64 250 7 bits 44.07 2269.17 46.79 1 373.72 366.592 73.624 HNSW 0.545 3.193 3.185 0.997 100000 100 50 64 250 4 bits 47.26 2115.95 50.04 1 338.13 329.971 37.003 HNSW ``` `main` with `jdk.incubator.vector`: ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s force_merge(s) num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.863 1.904 1.886 0.991 100000 100 50 64 250 7 bits 28.65 3490.65 29.66 1 373.69 366.592 73.624 HNSW 0.545 1.313 1.305 0.994 100000 100 50 64 250 4 bits 22.86 4373.88 17.84 1 338.13 329.971 37.003 HNSW ``` This PR without `jdk.incubator.vector`: ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s force_merge(s) num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.861 2.774 2.765 0.997 100000 100 50 64 250 7 bits 44.60 2242.00 46.71 1 373.73 366.592 73.624 HNSW 0.545 3.147 3.139 0.997 100000 100 50 64 250 4 bits 47.93 2086.51 50.20 1 338.11 329.971 37.003 HNSW ``` This PR with `jdk.incubator.vector`: ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s force_merge(s) num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.861 1.612 1.603 0.994 100000 100 50 64 250 7 bits 22.99 4349.53 24.78 1 373.70 366.592 73.624 HNSW 0.545 1.277 1.269 0.994 100000 100 50 64 250 4 bits 21.60 4630.49 17.41 1 338.11 329.971 37.003 HNSW ``` I did see slight fluctuation across runs, but the search time was \~10% faster for 7-bit and very slightly faster for 4-bit (compressed). Indexing and force merge times have improved by \~15% -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org