Michael Sokolov created LUCENE-10577:
----------------------------------------

             Summary: Quantize vector values
                 Key: LUCENE-10577
                 URL: https://issues.apache.org/jira/browse/LUCENE-10577
             Project: Lucene - Core
          Issue Type: Improvement
          Components: core/codecs
            Reporter: Michael Sokolov


The {{KnnVectorField}} api handles vectors with 4-byte floating point values. 
These fields can be used (via {{KnnVectorsReader}}) in two main ways:

1. The {{VectorValues}} iterator enables retrieving values
2. Approximate nearest -neighbor search

The main point of this addition was to provide the search capability, and to 
support that it is not really necessary to store vectors in full precision. 
Perhaps users may also be willing to retrieve values in lower precision for 
whatever purpose those serve, if they are able to store more samples. We know 
that 8 bits is enough to provide a very near approximation to the same 
recall/performance tradeoff that is achieved with the full-precision vectors. 
I'd like to explore how we could enable 4:1 compression of these fields by 
reducing their precision.

A few ways I can imagine this would be done:

1. Provide a parallel byte-oriented API. This would allow users to provide 
their data in reduced-precision format and give control over the quantization 
to them. It would have a major impact on the Lucene API surface though, 
essentially requiring us to duplicate all of the vector APIs.
2. Automatically quantize the stored vector data when we can. This would 
require no or perhaps very limited change to the existing API to enable the 
feature.

I've been exploring (2), and what I find is that we can achieve very good 
recall results using dot-product similarity scoring by simple linear scaling + 
quantization of the vector values, so long as  we choose the scale that 
minimizes the quantization error. Dot-product is amenable to this treatment 
since vectors are required to be unit-length when used with that similarity 
function. 

 Even still there is variability in the ideal scale over different data sets. A 
good choice seems to be max(abs(min-value), abs(max-value)), but of course this 
assumes that the data set doesn't have a few outlier data points. A theoretical 
range can be obtained by 1/sqrt(dimension), but this is only useful when the 
samples are normally distributed. We could in theory determine the ideal scale 
when flushing a segment and manage this quantization per-segment, but then 
numerical error could creep in when merging.

I'll post a patch/PR with an experimental setup I've been using for evaluation 
purposes. It is pretty self-contained and simple, but has some drawbacks that 
need to be addressed:

1. No automated mechanism for determining quantization scale (it's a constant 
that I have been playing with)
2. Converts from byte/float when computing dot-product instead of directly 
computing on byte values

I'd like to get people's feedback on the approach and whether in general we 
should think about doing this compression under the hood, or expose a 
byte-oriented API. Whatever we do I think a 4:1 compression ratio is pretty 
compelling and we should pursue something.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to