msokolov commented on issue #12497: URL: https://github.com/apache/lucene/issues/12497#issuecomment-1676430028
Q: have we considered choosing an initial quantization based on the first segment seen and using that for all subsequent segments? Or: providing an API where quantization parameters can be provided? These kinds of approaches would seem to offer increased benefits in that we would not need to store the original vectors (since re-quantization would never be required). Looking at product-quantization schemes based on kmeans clustering this seems to be the usual approach (train the kmeans offline on a subset of the vectors, and then use them for all the vectors). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org