lpld commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2710872661
@benwtrent @mikemccand I really appreciate your help and quick responses.
May I also ask about the selection of datasets being used for the
benchmarks? How do you choose them? Why I'm
mikemccand commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2710460602
> `fanout` makes the search queue when searching the HNSW graph larger.
However, the searcher will still only return `k` results. So, searching for top
`k=10` with `fanout=20` indicat
mikemccand commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2710473036
> > Could you please also share other parameters of your benchmark (ndoc,
maxConn, beamWidthIndex, fanout, etc.)
>
> I have lost my test environment and I regrettably didn't wri
benwtrent commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2713454213
Hey @lpld
> May I also ask about the selection of datasets being used for the
benchmarks? How do you choose them?
I haven't tested with SIFT, though be sure to use euclid
mikemccand commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2710482360
> @lpld quantizing is done per segment, at flush and merge time. So it takes
into account live vectors in the segment during flush and merge.
>
> I don't see why adding/updating
benwtrent commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2710314969
@lpld quantizing is done per segment, at flush and merge time. So it takes
into account live vectors in the segment during flush and merge.
I don't see why adding/updating/deleti
lpld commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2709087834
@benwtrent A short question again. Does this quantization approach in
principle applicable when my data is constantly changing, i.e. new vectors are
being added and old vectors removed from
lpld commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2706558384
@benwtrent This makes sense, thank you!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the s
benwtrent commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2706428269
@lpld
I agree, both are doing similar things but there are some important
distinctions.
`oversample` indicates that you are going to return that ratio more results
from
lpld commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2702238939
@benwtrent Thanks for your response, it was quite helpful. Could you please
also share other parameters of your benchmark (ndoc, maxConn, beamWidthIndex,
fanout, etc.) ?
I was able t
benwtrent commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2698684914
@lpld here is my Lucene util changes:
https://github.com/mikemccand/luceneutil/pull/348
> What exactly do the numbers in the description of this pull request mean?
When you say
lpld commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2698569925
Hi @benwtrent
Thanks again for your previous comment. I was able to modify luceneutil and
run some benchmarks. I am quite new to lucene, so I would appreciate some help
in understan
benwtrent merged PR #14078:
URL: https://github.com/apache/lucene/pull/14078
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.a
gaoj0017 commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2692580673
Thank you for acknowledging that our extended RaBitQ method proposes the
idea of exploring different scalar quantization parameters on a per-vector
basis for the first time and OSQ adop
lpld commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2684703383
@benwtrent Thanks for your reply!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specifi
lpld commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2681113395
@benwtrent Thanks for your reply!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specifi
benwtrent commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2679008833
@gaoj0017
> The OSQ method (introduced in this PR) has its major idea similar to our
extended RaBitQ method and our extended RaBitQ method is a prior art which
achieves good ac
benwtrent commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2679022162
> I wonder where can I find the code for the benchmarks that you are
mentioning in the description? Thanks!
@lpld I patched a version of Lucene util, sort of like this:
https://
lpld commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2678033882
Hi @benwtrent, that's an amazing amount of work.
I wonder where can I find the code for the benchmarks that you are
mentioning in the description? Thanks!
--
This is an automated m
gaoj0017 commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2664639299
As we have consistently emphasized in both public and private
communications, we are concerned that the **OSQ method employs an idea
highly similar to the one presented in our [extended
tveasey commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2652753715
This pull request relates only to OSQ, and thus the proper scope of
discussion is regarding the concerns raised around its attribution.
We have pursued multiple conversations and d
gaoj0017 commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2644556568
After Elastic’s last round of replies, the Elastic team reached us for
clarification on the issues via zoom meetings. In the meetings, they promised
to fix the misattribution, so we sus
github-actions[bot] commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2626006980
This PR has not had activity in the past 2 weeks, labeling it as stale. If
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you
for your contributi
tveasey commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2573444675
Just sticking purely to the issues raised regarding this PR and the blog Ben
linked explaining the methodology...
> Although the RaBitQ approach is conceptually rather different to
ChrisHegarty commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2573298347
In my capacity as the Lucene PMC Chair (and with explicit acknowledgment of
my current employment with Elastic, as of the date of this writing), I want to
emphasize that proper attr
gaoj0017 commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2573030521
Hi @msokolov , the discussion here is not only about the blog posts but also
related to the pull request here. In this pull request (and its related blogs),
it claims a new method witho
benwtrent commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2569565637
To head this off, this implementation is not an evolution of RabitQ in any
way. It's intellectually dishonest to say it's an evolution of RaBitQ. I know
that's pedantic, but it's a fac
mikemccand commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2569246977
+1 for proper attribution.
We should give credit where credit is due. The evolution of this PR clearly
began with the RaBitQ paper, as seen in the [opening comment on the origi
msokolov commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2565588002
@gaoj0017 it sounds to me as if your concern is about lack of attribution in
the blog post you mentioned, and doesn't really relate to this pull request
(code change) - is that accurate
gaoj0017 commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2562753433
@benwtrent Thanks for your reply.
First, in the blog - [Better Binary Quantization at Elastic and
Lucene](https://www.elastic.co/search-labs/blog/better-binary-quantization-lucen
benwtrent commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2555050252
@gaoj0017 Thank you for your feedback!
Truly, y'all inspired us on improving scalar quantization. RaBitQ showed
that it is possible to achieve 32x reduction while achieving high
mayya-sharipova commented on code in PR #14078:
URL: https://github.com/apache/lucene/pull/14078#discussion_r1890793535
##
lucene/core/src/java/org/apache/lucene/codecs/lucene102/Lucene102BinaryQuantizedVectorsFormat.java:
##
@@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Soft
mayya-sharipova commented on code in PR #14078:
URL: https://github.com/apache/lucene/pull/14078#discussion_r1890779090
##
lucene/core/src/java/org/apache/lucene/codecs/lucene102/Lucene102BinaryQuantizedVectorsFormat.java:
##
@@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Soft
mayya-sharipova commented on code in PR #14078:
URL: https://github.com/apache/lucene/pull/14078#discussion_r1890778000
##
lucene/core/src/java/org/apache/lucene/codecs/lucene102/package-info.java:
##
@@ -0,0 +1,436 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) und
gaoj0017 commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2550510539
Hi @benwtrent , I am the first author of the [RaBitQ
paper](https://arxiv.org/abs/2405.12497) and [its extended
version](https://arxiv.org/abs/2409.09913). As your team have known, our
benwtrent opened a new pull request, #14078:
URL: https://github.com/apache/lucene/pull/14078
This provides a binary vector format for vectors. The key ideas are:
- Centroid centered vectors
- Asymmetric quantization
- Individually optimized scalar quantization
This all
36 matches
Mail list logo