gautamworah96 opened a new issue, #13403: URL: https://github.com/apache/lucene/issues/13403
### Description I opened this issue as a discussion topic. With the advancement in int8, int4 type vector storage, I believe Lucene takes the unquantized vectors as inputs, intelligently calculates the correct quantized value, and then indexes it. Another technique that experimenters use to improve vector search is to reduce the number of dimensions. In practical terms, this translates to using PCA (Principal Component Analysis) or other techniques. Should Lucene implement support for PCA or other dimensionality reduction techniques (or add a hook maybe) internally? Or can we rely on the user preprocessing their vectors and supplying them? I am undecided on whether a "search" and "information retrieval" library should add advanced statistics functionality (if I may call PCA that).. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org