Pulkitg64 commented on PR #14792: URL: https://github.com/apache/lucene/pull/14792#issuecomment-2977859146
We are experimenting with large vector indexes, and since (raw unquantized) vectors consume significant disk space (4x more than quantized vectors), we want to drop the raw vectors from searcher machines. We are currently using vector values for below use cases: 1. Calculating the dot-product scores and return them in search results 2. Returning the vectors in search results 3. Vector counting for metrics For use case 1 we have started to use [vectorScorer](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsReader.java#L444) which use quantized vectors for computing score so we are good there. For use cases 2 and 3, we currently use floatVectorValues using [getFloatVectorValues](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsReader.java#L189) but need to switch to quantizedVectorValues since searchers won't have float vectors anymore and we are okay in accepting the accuracy loss from float-to-byte quantization. To address these use cases, we have two options: * Introduce a new API: getQuantizedVectorValues to access quantizedByteVector OR * Use our local workaround: Make the [QuantizedVectorValues](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsReader.java#L408) class and its members public to directly access quantized vectors I would like to know your thoughts on whether we should create such an API, and if you think the above use cases don't justify a new API, what are your thoughts on implementing the workaround solution and pushing it upstream? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org