benwtrent commented on PR #13197: URL: https://github.com/apache/lucene/pull/13197#issuecomment-2021292348
I did a bunch of local benchmarking on this. I am adding a parameter to allow optional compression as the numbers without compressing are compelling enough on ARM to justify it IMO. To achieve similar recall, `int4` without compression is about 30% faster. With compression its about 30% slower, but with 50% of the memory requirements. Here are some latency vs. recall for int7, and int4 with this change. ``` plt.plot([1.49, 1.53, 1.54, 1.83, 2.09], [0.952, 0.962, 0.965, 0.974, 0.981], marker='o', label='int7') plt.plot([1.72, 1.75, 1.79, 2.04, 2.48], [0.897, 0.915, 0.929, 0.971, 0.980 ], marker='o', label='int4_compressed') plt.plot([1.08, 1.12, 1.12, 1.34, 1.50], [0.897, 0.915, 0.929, 0.971, 0.980 ], marker='o', label='int4') ```  int4 with compression gives 2x space improvement over int7, but it comes at an obvious cost as we have to (un)pack bytes during dot-products. Here are the numbers around index building as well. I committed ever 1MB to ensure merging occurred and that force-merging was adequately exercised. Int4 no compression: ``` Indexed 500000 documents in 312090ms Force merge done in: 76169 ms ``` Int4 compression: ``` Indexed 500000 documents in 326978ms Force merge done in: 124961 ms ``` Int7: ``` Indexed 500000 documents in 344584ms Force merge done in: 98311 ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org