Re: [I] Significant drop in recall for 8 bit Scalar Quantizer [lucene]

via GitHub Tue, 02 Jul 2024 10:10:50 -0700


benwtrent commented on issue #13519:
URL: https://github.com/apache/lucene/issues/13519#issuecomment-2200052667

My concern for 8 bit quantization is the algebraic expansion of dot-product
and the corrective terms.

For scalar quantization, the score corrections for dotProduct are derivable
via some simple algebra, but I am not immediately aware of a way to handle the
sign switch. I didn't bother digging deeper there as int7 provides basically
the exact same recall. I am eager to see if 8bit can be applied while keeping
the score corrections.

In case you need it, here is valuable background:

https://www.elastic.co/search-labs/blog/scalar-quantization-101

For some background on the small additional correction provided for int4 (or
any scalar quantization where confidence_interval is set to `0`):

https://www.elastic.co/search-labs/blog/vector-db-optimized-scalar-quantization

Let me see if I can answer all the other questions (Sorry if I missed any,
2nd thread related to scalar quantization and I might be conflating different
things).

> In terms of quantization, are we doing any extra processing for 4 and 7
bits when compared to 8 bits ? I believe not.

Typically not. But, int4 honestly needs dynamic confidence intervals to
work. You cannot statically set the confidence interval if you want good recall
without a ton of oversampling. Setting the confidence_interval to `0` is an
indication that you want the quantiles to be dynamically calculated (not
statically calculated via some confidence interval).

> For 7 bits, how are we reducing memory usage compared to 8 bits. Are we
doing any extra compression somewhere. Am I missing something ?

No, we are not. There are nice SIMD performance properties for dot-product,
but similar nice properties can be applied if the signed byte limits are
between `-127` and `127`

> For 4 bits, should we must set compress flag to True to [reduce the memory
usage by about 50%
](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene99/OffHeapQuantizedByteVectorValues.java#L72-L89)(theoretically)
compared to 8 bits ?

Correct.

> might be a dumb question. Does compress only works for 4 bits ?

The only dumb question is an unasked one.

Correct. Only for 4 bits. I seriously considered always compressing (thus
there being no parameter), but the performance hit was too significant. I was
able to close it significantly over time through targeted Panama Vector APIs. I
hope that we can move away from having an "uncompressed" version once we get
off-heap scoring for quantized vectors. I have a draft PR for this, but I am
running into weird perf issues and haven't yet been able to dig more deeply.

> I think this applies to the quantized vectors, which are (offheap) hot
during searching.

Absolutely correct.

> not sure how much RAM (I think also off-heap?) it will need vs the vectors

Way way less. The main cost is the vectors themselves. The graph is way
smaller (we do delta variable encoding for the neighbors). The graph size (per
layer) is:

- 1 `int` per vector in that layer
- 1 `int` and its delta & variably encoded neighbors. This obviously
changes based on the number of connections configured.

Consider the WORST case (where the delta & variable encoding does NOTHING),
the base layer would have 32 connections. So, that is 33 * 4 bytes per vector
for base layer of the graph, if its fully connected. This is way less than the
vectors as vector dimensions are usually many hundreds (384 is the smallest
performant model I have found, e5small).

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Significant drop in recall for 8 bit Scalar Quantizer [lucene]

Reply via email to