benwtrent opened a new issue, #15064: URL: https://github.com/apache/lucene/issues/15064
### Description @vigyasharma found that single bit is actually really nice and sometimes better than the current half-byte https://github.com/mikemccand/luceneutil/pull/435 As a user, this doesn't make sense. more bits means more information. Well, the current scalar quantization formats use the old `ScalarQuantizer`. We should consider switching the formats to use `OptimizedScalarQuantizer`. This will not be a simple change. But, I don't think it should be TOO complicated? (at least just as complicated as any other format change). - On disk layout of the vectors needs to change (more corrective values) - More metadata needs to be retained (e.g. centroid) - Different vector scorers (the scoring function is different) But, I would expect half-byte (4bit) quantization to achieve 90%+ recall without any oversampling & reranking for larger vector dimensions 7bit (yeah, we likely still need to worry about signed int8 being busted....) will also be even better. Again, I expect this to be a big 'ole chunk of work :/. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org