kaivalnp commented on PR #14863: URL: https://github.com/apache/lucene/pull/14863#issuecomment-3133877677
We had some interesting findings in #14874, so I updated this PR to reflect those.. Benchmarks for 768d Cohere vectors (dot product similarity, 4-bit one is compressed): `main` with `-reindex`: ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s force_merge(s) num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.862 1.607 1.606 1.000 100000 100 50 64 250 7 bits 39.43 2536.33 18.73 1 373.37 366.592 73.624 HNSW 0.542 1.211 1.210 0.999 100000 100 50 64 250 4 bits 34.09 2933.76 15.33 1 337.76 329.971 37.003 HNSW ``` `main` without `-reindex`: ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s force_merge(s) num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.862 1.448 1.447 0.999 100000 100 50 64 250 7 bits 0.00 Infinity 0.12 1 373.37 366.592 73.624 HNSW 0.542 1.075 1.074 0.999 100000 100 50 64 250 4 bits 0.00 Infinity 0.12 1 337.76 329.971 37.003 HNSW ``` This PR with `-reindex`: ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s force_merge(s) num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.862 1.394 1.392 0.999 100000 100 50 64 250 7 bits 36.29 2755.73 17.35 1 373.36 366.592 73.624 HNSW 0.541 1.135 1.134 0.999 100000 100 50 64 250 4 bits 34.85 2869.11 15.53 1 337.74 329.971 37.003 HNSW ``` This PR without `-reindex`: ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s force_merge(s) num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.862 1.329 1.328 0.999 100000 100 50 64 250 7 bits 0.00 Infinity 0.12 1 373.36 366.592 73.624 HNSW 0.541 1.141 1.139 0.998 100000 100 50 64 250 4 bits 0.00 Infinity 0.13 1 337.74 329.971 37.003 HNSW ``` We see some speedup in both indexing and search for 7-bit compression, not so much for the 4-bit one I wrote specific functions in `PanamaVectorUtilSupport` to compute the "dot product" between two compressed (i.e. "packed") 4-bit integers (so no need to copy to heap and decompress) to be used during indexing, these are inspired from another function which assumed one of the vectors to be uncompressed (i.e. "unpacked") The drawback here is that we'll need specific functions for comparing "packed" versions of the query / documents for other similarity functions like "euclidean" as well -- looking for inputs on whether the gains justify maintaining those functions.. Also tagging people who might be interested in these changes, maybe @benwtrent (since you were exploring something similar in #13497) or @ChrisHegarty? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org