Re: [PR] Binary vector format for flat and hnsw vectors [lucene]

via GitHub Tue, 11 Feb 2025 22:10:31 -0800


tveasey commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2652753715


   This pull request relates only to OSQ, and thus the proper scope of 
discussion is regarding the concerns raised around its attribution.
   
   We have pursued multiple conversations and discussions in order to resolve 
various concerns amicably, and we continue to desire collaboration and open 
discourse with respect to each party's innovations on this problem space. We 
would like to reiterate that we highly rate the research in both RaBitQ and its 
extension. However, we also maintain that OSQ is not a derivative of extended 
RaBitQ as we have done in private communications. 
   
   > In addition, Elastic shared a doc with us explaining the differences 
between OSQ and extended RaBitQ, but our understanding is that the core ideas 
of two methods, i.e., trying different parameters for scalar quantization on a 
per-vector basis, are similar.
   
   This goes to crux of the disagreement around attribution. In the following 
we describe the two schemes: extended RaBitQ [1] and OSQ [2] and state again 
the points we have made privately regarding their differences.
   
   Both methods centre the data, which was originally proposed in LVQ [3] to 
the best of our knowledge.
   
   Extended RaBitQ proposes a method for constructing a codebook for which one 
can compute the dot product efficiently using integer arithmetic.
   
   In a nutshell it proceeds as follows:
   1. Apply a random orthogonal transformation of the input vectors.
   2. Normalise resulting vectors and perform an efficient search for the 
nearest codebook centroid.
   3. Snap the floating point vector to this codebook centroid and then undo 
the normalization procedure to obtain an estimate of the original dot product.
   
   At its heart it is a variant of product quantization which allows for 
efficient similarity calculations and was positioned as such in the original 
paper.
   
   The key ingredients of OSQ are as follows:
   1. Construct per vector quantization intervals, inherited from LVQ.
   2. Formulate quantization as a hyperparameter optimization problem for the 
interval [a, b] used to construct the grid of possible quantized vectors, 
building on our earlier work which we published in April [4].
   3. Perform an analysis to show how to choose [a, b] to minimize the expected 
square error in the dot product when the data are normally distributed using 
sufficient statistics of the vector component distribution. (We perform an 
empirical study of the distribution of vector components in embedding models 
and show they are typically normally distributed.)
   4. Perform an iterative procedure based on coordinate descent which 
optimizes an anisotropic loss based on the ideas in [5] w.r.t. the interval [a, 
b] initialising with the output of 3.
   
   At an algorithmic level there is no similarity between the methods other 
than they both centre the data. At a conceptual level RaBitQ was developed as a 
variant of product quantization and OSQ was developed as an approach to 
hyperparameter optimization for per vector scalar quantization. For the record, 
we were inspired to revisit scalar quantization in light of the success of 
binary RaBitQ and we were exploring ideas prior to any communication between 
our teams regarding extended RaBitQ. The actual inspiration for the approach 
was our previous work on hyperparameter optimisation for quantization intervals 
and LVQ and we attribute this PR accordingly.
   
   Retrospectively, and in parallel to our work the authors of RaBitQ explored 
the relationship between extended RaBitQ and scalar quantization directly ([6] 
published 12/12/2024). They show that the act of normalising the vectors, 
finding the nearest centroid then undoing the normalisation can be thought of 
as a process of rescaling the quantization grid so that the nearest quantized 
vector winds up closer to the original floating point vector. Whilst this bears 
some relation to optimising b - a they never formulate this process as an 
optimisation problem and at no point show exactly what quantity they in fact 
optimise. In our view, this misses the key point that the important idea is the 
formulation. As such it is very clear that they do not and could not introduce 
any different optimisation objective, such as anisotopic loss in the dot 
product. Even if they did, OSQ optimises both a and b directly which gives it 
an extra degree of freedom and develops an effective and non-trivial 
 and completely distinct solution for this problem.
   
   As we have communicated privately, we dispute that OSQ is a variant of 
extended RaBitQ. We laid out in broadly similar terms exactly why we disagree 
at a technical level (along the lines of the discussion in this comment). We 
have received no further technical engagement with the points we have raised 
and have therefore reached an impasse.
   
   Regarding this point:
   
   >  They claimed to have unlocked the performance gain for various bits. 
However, they ignored the fact that our extended RaBitQ method can already 
handle various bits, despite that the method was shared with Elastic on 25 Oct 
2024 and published on arXiv in Sep 2024.
   
   we at no point intended to imply that RaBitQ could not be extended to 
support more than 1 bit and that this is the only motivation for our work. The 
main original motivations are all the benefits we see in being able to achieve 
high quality representations using a variant of vanilla scalar quantization 
such as:
   1. Simplicity, and
   2. And the speed with which we can calculate of quantized vectors for any 
compression rate.
   
   We performed and published a study evaluating binary variants and showed 
consistent improvements in comparison to binary RaBitQ [2]. The study we 
performed is perfectly in the spirit of evaluating competing methods and covers 
multiple datasets and embedding models. Our initial focus was on binary 
quantization, since this is the only work we were releasing in product at that 
point. We privately discussed plans with the authors of RaBitQ to perform 
benchmarking for higher bit counts compared to extended RaBitQ and also to 
perform larger scale benchmarks.
   
   We were unaware of the [4] at the time the blog [2] was written and we 
communicated privately that we plan to publish a survey which includes a 
discussion of both works as well as other pertinent work in the field. We have 
furthermore stated privately, that this feels like a more appropriate forum in 
which to discuss the two methods.
   
   [1] Practical and Asymptotically Optimal Quantization of High-Dimensional 
Vectors in Euclidean Space for Approximate Nearest Neighbor Search, 
https://arxiv.org/pdf/2409.09913
   [2] Understanding optimized scalar quantization, 
https://www.elastic.co/search-labs/blog/scalar-quantization-optimization
   [3] Similarity search in the blink of an eye with compressed indices, 
https://arxiv.org/abs/2304.04759
   [4] Scalar quantization optimized for vector databases,
   
https://www.elastic.co/search-labs/blog/vector-db-optimized-scalar-quantization
   [5] Accelerating Large-Scale Inference with Anisotropic Vector Quantization, 
https://arxiv.org/pdf/1908.10396
   [6] Extended RaBitQ: an Optimized Scalar Quantization Method,
   
https://dev.to/gaoj0017/extended-rabitq-an-optimized-scalar-quantization-method-83m


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Binary vector format for flat and hnsw vectors [lucene]

Reply via email to