benwtrent commented on issue #13519:
URL: https://github.com/apache/lucene/issues/13519#issuecomment-2200052667

   My concern for 8 bit quantization is the algebraic expansion of dot-product 
and the corrective terms.
   
   For scalar quantization, the score corrections for dotProduct are derivable 
via some simple algebra, but I am not immediately aware of a way to handle the 
sign switch. I didn't bother digging deeper there as int7 provides basically 
the exact same recall. I am eager to see if 8bit can be applied while keeping 
the score corrections.
   
   In case you need it, here is valuable background:
   
   https://www.elastic.co/search-labs/blog/scalar-quantization-101
   
   For some background on the small additional correction provided for int4 (or 
any scalar quantization where confidence_interval is set to `0`):
   
   
https://www.elastic.co/search-labs/blog/vector-db-optimized-scalar-quantization
   
   
   Let me see if I can answer all the other questions (Sorry if I missed any, 
2nd thread related to scalar quantization and I might be conflating different 
things).
   
   > In terms of quantization, are we doing any extra processing for 4 and 7 
bits when compared to 8 bits ? I believe not.
   
   Typically not. But, int4 honestly needs dynamic confidence intervals to 
work. You cannot statically set the confidence interval if you want good recall 
without a ton of oversampling. Setting the confidence_interval to `0` is an 
indication that you want the quantiles to be dynamically calculated (not 
statically calculated via some confidence interval).
   
   > For 7 bits, how are we reducing memory usage compared to 8 bits. Are we 
doing any extra compression somewhere. Am I missing something ?
   
   No, we are not. There are nice SIMD performance properties for dot-product, 
but similar nice properties can be applied if the signed byte limits are 
between `-127` and `127`
   
   > For 4 bits, should we must set compress flag to True to [reduce the memory 
usage by about 50% 
](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene99/OffHeapQuantizedByteVectorValues.java#L72-L89)(theoretically)
 compared to 8 bits ?
   
   Correct.
   
   > might be a dumb question. Does compress only works for 4 bits ?
   
   The only dumb question is an unasked one.
   
   Correct. Only for 4 bits. I seriously considered always compressing (thus 
there being no parameter), but the performance hit was too significant. I was 
able to close it significantly over time through targeted Panama Vector APIs. I 
hope that we can move away from having an "uncompressed" version once we get 
off-heap scoring for quantized vectors. I have a draft PR for this, but I am 
running into weird perf issues and haven't yet been able to dig more deeply.
   
   > I think this applies to the quantized vectors, which are (offheap) hot 
during searching.
   
   Absolutely correct.  
   
   > not sure how much RAM (I think also off-heap?) it will need vs the vectors
   
   Way way less. The main cost is the vectors themselves. The graph is way 
smaller (we do delta variable encoding for the neighbors). The graph size (per 
layer) is:
   
    - 1 `int` per vector in that layer
    - 1 `int` and its delta & variably encoded neighbors. This obviously 
changes based on the number of connections configured. 
   
   Consider the WORST case (where the delta & variable encoding does NOTHING), 
the base layer would have 32 connections. So, that is 33 * 4 bytes per vector 
for base layer of the graph, if its fully connected. This is way less than the 
vectors as vector dimensions are usually many hundreds (384 is the smallest 
performant model I have found, e5small). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to