Pulkitg64 commented on PR #14792:
URL: https://github.com/apache/lucene/pull/14792#issuecomment-2977859146

   We are experimenting with large vector indexes, and since (raw unquantized) 
vectors consume significant disk space (4x more than quantized vectors), we 
want to drop the raw vectors from searcher machines. We are currently using 
vector values for below use cases:
   1. Calculating the dot-product scores and return them in search results
   2. Returning the vectors in search results
   3. Vector counting for metrics
   
   For use case 1 we have started to use 
[vectorScorer](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsReader.java#L444)
 which use quantized vectors for computing score so we are good there. For use 
cases 2 and 3, we currently use floatVectorValues using 
[getFloatVectorValues](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsReader.java#L189)
 but need to switch to quantizedVectorValues since searchers won't have float 
vectors anymore and we are okay in accepting the accuracy loss from 
float-to-byte quantization.
   
   To address these use cases, we have two options:
   * Introduce a new API: getQuantizedVectorValues to access 
quantizedByteVector OR
   * Use our local workaround: Make the 
[QuantizedVectorValues](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsReader.java#L408)
 class and its members public to directly access quantized vectors
   
   I would like to know your thoughts on whether we should create such an API, 
and if you think the above use cases don't justify a new API, what are your 
thoughts on implementing the workaround solution and pushing it upstream?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to