benwtrent opened a new issue, #13251:
URL: https://github.com/apache/lucene/issues/13251

   ### Description
   
   Tangentially related to: https://github.com/apache/lucene/issues/13158
   
   But, I have observed, that as the corpus reaches a fairly large size, the 
actual quantiles aren't changing much at all during segment merges. This is 
tricky to fully measure and make a promise about lossiness (users can always 
just start throwing garbage that shakes up the whole world). But if the data 
isn't a "bad actor", quantiles and quantization buckets become fairly stable 
over time.
   
   Maybe we should add a configuration option, or a new codec, or some way to 
drop the raw floating point vectors. The 4x reduction in disk usage would be 
really nice for many use-cases.
   
   I am not 100% sure how this would look (a threshold provided by the user, or 
we just do it based on internal statistics).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to