benwtrent commented on PR #13651: URL: https://github.com/apache/lucene/pull/13651#issuecomment-2359272883
Here is some more flat index test results. This was to exercise and see how the number of coarse grained centroids changes recall & speed. | Lucene912BinaryQuantizedVectorsFormat | recall@100 | recall@100\|500 | recall@100\|1000 | index time & force-merge | mean search time (brute-force) | |-------------------------------------------------------|------------|-----------------|------------------|--------------------------|--------------------------------| | hotpotqa-E5-small (5233329 vectors, 1 centroid) | 47 | 81 | 90 | 79618ms | 177ms | | hotpotqa-E5-small (5233329 vectors, 255 centroids) | 52 | 86 | 94 | 267522ms | 211ms | | hotpotqa-gte-base (5233329 vectors, 1 centroid) | 66 | 96 | 99 | 214254ms | 227ms | | hotpotqa-gte-base (5233329 vectors, 255 centroids) | 71 | 98 | 99 | 617887ms | 275ms | | dbpedia-entity-arctic (4635922 vectors, 1 centroid) | 56 | 89 | 95 | 199827ms | 205ms | | dbpedia-entity-arctic (4635922 vectors, 255 centroid) | 59 | 91 | 96 | 618797ms | 220ms | Models used: - https://huggingface.co/thenlper/gte-base - https://huggingface.co/intfloat/e5-small - https://huggingface.co/Snowflake/snowflake-arctic-embed-m -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org