gaoj0017 commented on PR #14078: URL: https://github.com/apache/lucene/pull/14078#issuecomment-2550510539
Hi @benwtrent , I am the first author of the [RaBitQ paper](https://arxiv.org/abs/2405.12497) and [its extended version](https://arxiv.org/abs/2409.09913). As your team have known, our RaBitQ method brings breakthrough performance on binary quantization and scalar quantization. We notice that in this pull request, you mention a method which individually optimizes the lower bound and upper bound of scalar quantization. This idea is highly similar to our idea of individually looking for the optimal rescaling factor of scalar quantization as described in our [extended RaBitQ](https://arxiv.org/abs/2409.09913) paper, which we shared with your team in Oct 2024. An intuitive explanation can be found in our recent [blog](https://dev.to/gaoj0017/extended-rabitq-an-optimized-scalar-quantization-method-83m). The mathematical equivalence between these two ideas is listed in Remark 2. In addition, the contribution of our [RaBitQ](https://arxiv.org/abs/2405.12497) has not been properly acknowledged at several other places. For example, in a previous post from Elastic - [Better Binary Quantization (BBQ) in Lucene and Elasticsearch](https://www.elastic.co/search-labs/blog/better-binary-quantization-lucene-elasticsearch), the major features of BBQ are introduced, yet it is not made clear that all these features originate from our [RaBitQ paper](https://arxiv.org/abs/2405.12497). In a [press release](https://www.elastic.co/blog/whats-new-elasticsearch-platform-8-16-0), Elastic claims that "Elasticsearch’s new BBQ algorithm redefines vector quantization", however, BBQ is not a grandly new method, but a variant of RaBitQ with some minor adaption. We note that when a breakthrough is made, it is always easy to derive its variants or to restate the method in different languages. One should not claim a variant to be a new method with a new name and ignore the contribution of the original method. We hope that you would understand our concern and acknowledge the contributions of our RaBitQ and its extension properly in your pull requests and/or blogs. * **Remark 1**. The BBQ feature fails on the GIST dataset because it removes the randomization operation of the RaBitQ method. With the randomization operation, RaBitQ is theoretically guaranteed to perform stably on all datasets. * **Remark 2**. Let $B$ be the number of bits for scalar quantization. The scalar quantization can be represented in two equivalent ways. 1. Scalar quantization can be determined by the lower bound $v_l$ and the upper bound $v_r$. The algorithm first computes $\Delta =(v_r-v_l) / (2^{B}-1)$ and then maps each real value $x$ to the nearest integer of $(x-v_l) / \Delta$. 2. Based on the process above, scalar quantization can be equivalently determined by a rescaling factor $\Delta$ and a shifting factor $v_l$. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org