gaoj0017 commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2550510539

   Hi @benwtrent , I am the first author of the [RaBitQ 
paper](https://arxiv.org/abs/2405.12497) and [its extended 
version](https://arxiv.org/abs/2409.09913). As your team have known, our RaBitQ 
method brings breakthrough performance on binary quantization and scalar 
quantization.
    
   We notice that in this pull request, you mention a method which individually 
optimizes the lower bound and upper bound of scalar quantization. This idea is 
highly similar to our idea of individually looking for the optimal rescaling 
factor of scalar quantization as described in our [extended 
RaBitQ](https://arxiv.org/abs/2409.09913) paper, which we shared with your team 
in Oct 2024. An intuitive explanation can be found in our recent 
[blog](https://dev.to/gaoj0017/extended-rabitq-an-optimized-scalar-quantization-method-83m).
 The mathematical equivalence between these two ideas is listed in Remark 2.
    
   In addition, the contribution of our 
[RaBitQ](https://arxiv.org/abs/2405.12497) has not been properly acknowledged 
at several other places. For example, in a previous post from Elastic - [Better 
Binary Quantization (BBQ) in Lucene and 
Elasticsearch](https://www.elastic.co/search-labs/blog/better-binary-quantization-lucene-elasticsearch),
 the major features of BBQ are introduced, yet it is not made clear that all 
these features originate from our [RaBitQ 
paper](https://arxiv.org/abs/2405.12497). In a [press 
release](https://www.elastic.co/blog/whats-new-elasticsearch-platform-8-16-0), 
Elastic claims that "Elasticsearch’s new BBQ algorithm redefines vector 
quantization", however, BBQ is not a grandly new method, but a variant of 
RaBitQ with some minor adaption.
    
   We note that when a breakthrough is made, it is always easy to derive its 
variants or to restate the method in different languages. One should not claim 
a variant to be a new method with a new name and ignore the contribution of the 
original method. We hope that you would understand our concern and acknowledge 
the contributions of our RaBitQ and its extension properly in your pull 
requests and/or blogs.
    
   * **Remark 1**. The BBQ feature fails on the GIST dataset because it removes 
the randomization operation of the RaBitQ method. With the randomization 
operation, RaBitQ is theoretically guaranteed to perform stably on all 
datasets. 
   * **Remark 2**. Let $B$ be the number of bits for scalar quantization. The 
scalar quantization can be represented in two equivalent ways.
                   1. Scalar quantization can be determined by the lower bound 
$v_l$ and the upper bound $v_r$. The algorithm first computes $\Delta 
=(v_r-v_l) / (2^{B}-1)$ and then maps each real value $x$ to the nearest 
integer of $(x-v_l) / \Delta$.
                   2. Based on the process above, scalar quantization can be 
equivalently determined by a rescaling factor $\Delta$ and a shifting factor 
$v_l$.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to