tveasey commented on PR #14078: URL: https://github.com/apache/lucene/pull/14078#issuecomment-2652753715
This pull request relates only to OSQ, and thus the proper scope of discussion is regarding the concerns raised around its attribution. We have pursued multiple conversations and discussions in order to resolve various concerns amicably, and we continue to desire collaboration and open discourse with respect to each party's innovations on this problem space. We would like to reiterate that we highly rate the research in both RaBitQ and its extension. However, we also maintain that OSQ is not a derivative of extended RaBitQ as we have done in private communications. > In addition, Elastic shared a doc with us explaining the differences between OSQ and extended RaBitQ, but our understanding is that the core ideas of two methods, i.e., trying different parameters for scalar quantization on a per-vector basis, are similar. This goes to crux of the disagreement around attribution. In the following we describe the two schemes: extended RaBitQ [1] and OSQ [2] and state again the points we have made privately regarding their differences. Both methods centre the data, which was originally proposed in LVQ [3] to the best of our knowledge. Extended RaBitQ proposes a method for constructing a codebook for which one can compute the dot product efficiently using integer arithmetic. In a nutshell it proceeds as follows: 1. Apply a random orthogonal transformation of the input vectors. 2. Normalise resulting vectors and perform an efficient search for the nearest codebook centroid. 3. Snap the floating point vector to this codebook centroid and then undo the normalization procedure to obtain an estimate of the original dot product. At its heart it is a variant of product quantization which allows for efficient similarity calculations and was positioned as such in the original paper. The key ingredients of OSQ are as follows: 1. Construct per vector quantization intervals, inherited from LVQ. 2. Formulate quantization as a hyperparameter optimization problem for the interval [a, b] used to construct the grid of possible quantized vectors, building on our earlier work which we published in April [4]. 3. Perform an analysis to show how to choose [a, b] to minimize the expected square error in the dot product when the data are normally distributed using sufficient statistics of the vector component distribution. (We perform an empirical study of the distribution of vector components in embedding models and show they are typically normally distributed.) 4. Perform an iterative procedure based on coordinate descent which optimizes an anisotropic loss based on the ideas in [5] w.r.t. the interval [a, b] initialising with the output of 3. At an algorithmic level there is no similarity between the methods other than they both centre the data. At a conceptual level RaBitQ was developed as a variant of product quantization and OSQ was developed as an approach to hyperparameter optimization for per vector scalar quantization. For the record, we were inspired to revisit scalar quantization in light of the success of binary RaBitQ and we were exploring ideas prior to any communication between our teams regarding extended RaBitQ. The actual inspiration for the approach was our previous work on hyperparameter optimisation for quantization intervals and LVQ and we attribute this PR accordingly. Retrospectively, and in parallel to our work the authors of RaBitQ explored the relationship between extended RaBitQ and scalar quantization directly ([6] published 12/12/2024). They show that the act of normalising the vectors, finding the nearest centroid then undoing the normalisation can be thought of as a process of rescaling the quantization grid so that the nearest quantized vector winds up closer to the original floating point vector. Whilst this bears some relation to optimising b - a they never formulate this process as an optimisation problem and at no point show exactly what quantity they in fact optimise. In our view, this misses the key point that the important idea is the formulation. As such it is very clear that they do not and could not introduce any different optimisation objective, such as anisotopic loss in the dot product. Even if they did, OSQ optimises both a and b directly which gives it an extra degree of freedom and develops an effective and non-trivial and completely distinct solution for this problem. As we have communicated privately, we dispute that OSQ is a variant of extended RaBitQ. We laid out in broadly similar terms exactly why we disagree at a technical level (along the lines of the discussion in this comment). We have received no further technical engagement with the points we have raised and have therefore reached an impasse. Regarding this point: > They claimed to have unlocked the performance gain for various bits. However, they ignored the fact that our extended RaBitQ method can already handle various bits, despite that the method was shared with Elastic on 25 Oct 2024 and published on arXiv in Sep 2024. we at no point intended to imply that RaBitQ could not be extended to support more than 1 bit and that this is the only motivation for our work. The main original motivations are all the benefits we see in being able to achieve high quality representations using a variant of vanilla scalar quantization such as: 1. Simplicity, and 2. And the speed with which we can calculate of quantized vectors for any compression rate. We performed and published a study evaluating binary variants and showed consistent improvements in comparison to binary RaBitQ [2]. The study we performed is perfectly in the spirit of evaluating competing methods and covers multiple datasets and embedding models. Our initial focus was on binary quantization, since this is the only work we were releasing in product at that point. We privately discussed plans with the authors of RaBitQ to perform benchmarking for higher bit counts compared to extended RaBitQ and also to perform larger scale benchmarks. We were unaware of the [4] at the time the blog [2] was written and we communicated privately that we plan to publish a survey which includes a discussion of both works as well as other pertinent work in the field. We have furthermore stated privately, that this feels like a more appropriate forum in which to discuss the two methods. [1] Practical and Asymptotically Optimal Quantization of High-Dimensional Vectors in Euclidean Space for Approximate Nearest Neighbor Search, https://arxiv.org/pdf/2409.09913 [2] Understanding optimized scalar quantization, https://www.elastic.co/search-labs/blog/scalar-quantization-optimization [3] Similarity search in the blink of an eye with compressed indices, https://arxiv.org/abs/2304.04759 [4] Scalar quantization optimized for vector databases, https://www.elastic.co/search-labs/blog/vector-db-optimized-scalar-quantization [5] Accelerating Large-Scale Inference with Anisotropic Vector Quantization, https://arxiv.org/pdf/1908.10396 [6] Extended RaBitQ: an Optimized Scalar Quantization Method, https://dev.to/gaoj0017/extended-rabitq-an-optimized-scalar-quantization-method-83m -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org