benwtrent closed pull request #13651: Add a Better Binary Quantizer format for
dense vectors
URL: https://github.com/apache/lucene/pull/13651
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the spec
benwtrent commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2549771070
Closing this PR in deference to this one:
https://github.com/apache/lucene/pull/14078
An evolution of scalar quantization proved more flexible and provided better
recall in our
gaoj0017 commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2542789836
Thanks, Tanya @tanyaroosta , for sharing our blog about RaBitQ in this
thread. I am the first author of the [RaBitQ
paper](https://arxiv.org/abs/2405.12497). I am glad to know that our
tanyaroosta commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2532887197
FYI, a blog post on RaBitQ:
https://dev.to/gaoj0017/quantization-in-the-counterintuitive-high-dimensional-space-4feg
--
This is an automated message from the Apache Git Serv
ShashwatShivam commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2470937060
I conducted a benchmark using Cohere's 768-dimensional data. Here are the
steps I followed for reproducibility:
1. **Set up** the [luceneutil
repository](https://github.com
benwtrent commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2470503074
Quick update, we have been bothered with some of the numbers (for example,
models like "gist" perform poorly) and we have some improvements to get done
first before flipping back to "r
mikemccand commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2470394946
> @ShashwatShivam I don't think there is a "memory column" provided
anywhere. I simply looked at the individual file sizes (veb, vex) and summed
their sizes together.
Once this
ShashwatShivam commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2466635037
Hey @benwtrent,
Thank you for all your help so far! I have a question about the oversampling
used to increase recall. From what I understand, it scales up the top-k and
fanout