Re: [PR] Add a Better Binary Quantizer format for dense vectors [lucene]

2024-12-17 Thread via GitHub
benwtrent closed pull request #13651: Add a Better Binary Quantizer format for dense vectors URL: https://github.com/apache/lucene/pull/13651 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Add a Better Binary Quantizer format for dense vectors [lucene]

2024-12-17 Thread via GitHub
benwtrent commented on PR #13651: URL: https://github.com/apache/lucene/pull/13651#issuecomment-2549771070 Closing this PR in deference to this one: https://github.com/apache/lucene/pull/14078 An evolution of scalar quantization proved more flexible and provided better recall in our

Re: [PR] Add a Better Binary Quantizer format for dense vectors [lucene]

2024-12-13 Thread via GitHub
gaoj0017 commented on PR #13651: URL: https://github.com/apache/lucene/pull/13651#issuecomment-2542789836 Thanks, Tanya @tanyaroosta , for sharing our blog about RaBitQ in this thread. I am the first author of the [RaBitQ paper](https://arxiv.org/abs/2405.12497). I am glad to know that our

Re: [PR] Add a Better Binary Quantizer format for dense vectors [lucene]

2024-12-10 Thread via GitHub
tanyaroosta commented on PR #13651: URL: https://github.com/apache/lucene/pull/13651#issuecomment-2532887197 FYI, a blog post on RaBitQ: https://dev.to/gaoj0017/quantization-in-the-counterintuitive-high-dimensional-space-4feg -- This is an automated message from the Apache Git Serv

Re: [PR] Add a Better Binary Quantizer format for dense vectors [lucene]

2024-11-12 Thread via GitHub
ShashwatShivam commented on PR #13651: URL: https://github.com/apache/lucene/pull/13651#issuecomment-2470937060 I conducted a benchmark using Cohere's 768-dimensional data. Here are the steps I followed for reproducibility: 1. **Set up** the [luceneutil repository](https://github.com

Re: [PR] Add a Better Binary Quantizer format for dense vectors [lucene]

2024-11-12 Thread via GitHub
benwtrent commented on PR #13651: URL: https://github.com/apache/lucene/pull/13651#issuecomment-2470503074 Quick update, we have been bothered with some of the numbers (for example, models like "gist" perform poorly) and we have some improvements to get done first before flipping back to "r

Re: [PR] Add a Better Binary Quantizer format for dense vectors [lucene]

2024-11-12 Thread via GitHub
mikemccand commented on PR #13651: URL: https://github.com/apache/lucene/pull/13651#issuecomment-2470394946 > @ShashwatShivam I don't think there is a "memory column" provided anywhere. I simply looked at the individual file sizes (veb, vex) and summed their sizes together. Once this

Re: [PR] Add a Better Binary Quantizer format for dense vectors [lucene]

2024-11-10 Thread via GitHub
ShashwatShivam commented on PR #13651: URL: https://github.com/apache/lucene/pull/13651#issuecomment-2466635037 Hey @benwtrent, Thank you for all your help so far! I have a question about the oversampling used to increase recall. From what I understand, it scales up the top-k and fanout