benwtrent opened a new issue, #13251: URL: https://github.com/apache/lucene/issues/13251
### Description Tangentially related to: https://github.com/apache/lucene/issues/13158 But, I have observed, that as the corpus reaches a fairly large size, the actual quantiles aren't changing much at all during segment merges. This is tricky to fully measure and make a promise about lossiness (users can always just start throwing garbage that shakes up the whole world). But if the data isn't a "bad actor", quantiles and quantization buckets become fairly stable over time. Maybe we should add a configuration option, or a new codec, or some way to drop the raw floating point vectors. The 4x reduction in disk usage would be really nice for many use-cases. I am not 100% sure how this would look (a threshold provided by the user, or we just do it based on internal statistics). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org