benwtrent commented on PR #14173:
URL: https://github.com/apache/lucene/pull/14173#issuecomment-2744242199
> do you confirm that, according to your knowledge, any relevant and active
work toward multi-valued vectors in Lucene is effectively aggregated here?
@alessandrobenedetti I thin
vigyasharma commented on PR #14173:
URL: https://github.com/apache/lucene/pull/14173#issuecomment-2744585265
re: using long for graph node ids, I can see how using int ordinals can be
limiting for the no. of vectors we can index per segment. However, adapting to
long node ids is also a non-
vigyasharma commented on PR #14173:
URL: https://github.com/apache/lucene/pull/14173#issuecomment-2751872315
> Another option I was pondering is adding a new field type dedicated to
multi-valued vectors.
I tried this in my first stab at this issue
(https://github.com/apache/lucene/pu
alessandrobenedetti commented on PR #14173:
URL: https://github.com/apache/lucene/pull/14173#issuecomment-2751006045
> > do you confirm that, according to your knowledge, any relevant and
active work toward multi-valued vectors in Lucene is effectively aggregated
here?
>
> @alessandr
alessandrobenedetti commented on PR #14173:
URL: https://github.com/apache/lucene/pull/14173#issuecomment-2743201362
@vigyasharma, from a first superficial pass, I see that this PR touches
similar points of my original outdated one:
https://github.com/apache/lucene/pull/12314, but it see
vigyasharma commented on code in PR #14173:
URL: https://github.com/apache/lucene/pull/14173#discussion_r2008411867
##
lucene/core/src/java/org/apache/lucene/util/hnsw/UpdatableScoreHeap.java:
##
Review Comment:
I'd like to keep the logic to update scores for already ingest
vigyasharma commented on PR #14173:
URL: https://github.com/apache/lucene/pull/14173#issuecomment-2744562872
Thanks for looking into this PR @alessandrobenedetti , this is the latest
iteration on multi-vector support.
It does build on the same central idea of assigning a unique ordina
alessandrobenedetti commented on PR #14173:
URL: https://github.com/apache/lucene/pull/14173#issuecomment-2743148001
Catching up on this and trying to understand how far we are now from my
original idea and implementation:
https://github.com/apache/lucene/pull/12314
Obviously, my c
alessandrobenedetti commented on code in PR #14173:
URL: https://github.com/apache/lucene/pull/14173#discussion_r2007476642
##
lucene/core/src/java/org/apache/lucene/util/hnsw/UpdatableScoreHeap.java:
##
Review Comment:
For example, what are the benefits of this in comparis
benwtrent commented on PR #14173:
URL: https://github.com/apache/lucene/pull/14173#issuecomment-2647965600
> I meant that since we'd be writing a new implementations for buildGraph
etc, merging etc, it might be easier to account for long nodeIds from the get go
Ah, I understand and I
vigyasharma commented on PR #14173:
URL: https://github.com/apache/lucene/pull/14173#issuecomment-2646869747
> I don't understand how DiskANN would solve any of the previously expressed
problems.
No it wouldn't solve any of these problems. I meant that since we'd be
writing a new imp
benwtrent commented on PR #14173:
URL: https://github.com/apache/lucene/pull/14173#issuecomment-2634182175
> Java limits the size of arrays (and lists) to 'int max' and does not allow
'long' array indices. These will need to be changed to use a different data
structure.
Yeah, I don't
vigyasharma commented on PR #14173:
URL: https://github.com/apache/lucene/pull/14173#issuecomment-2629167044
> I think this PR is still doing globally unique ordinals for vectors? So,
vectors 1, 2, 3 go to document 1 and ordinals 4, 5 go to doc 2? If so, I think
we should "bite the bullet"
vigyasharma commented on PR #14173:
URL: https://github.com/apache/lucene/pull/14173#issuecomment-2627799689
> I also don't understand the recall change between parentJoin on main vs.
parentJoin in your branch.
The parentJoin on my branch runs with merges disabled, and loads the extr
benwtrent commented on PR #14173:
URL: https://github.com/apache/lucene/pull/14173#issuecomment-2624759420
I like where this PR is going.
> Note: This change does not include dependent multi-valued vectors like
ColBERT, where the multiple vectors must used together to compute similari
benwtrent commented on PR #14173:
URL: https://github.com/apache/lucene/pull/14173#issuecomment-2621652581
> For parentJoin benchmark run on main, there is a visible drop in recall
when I disable merges (as compared to a main branch run with merges enabled).
Is this expected?
I wonde
vigyasharma commented on PR #14173:
URL: https://github.com/apache/lucene/pull/14173#issuecomment-2620016142
Ran some early benchmarks to compare this flat storage based multi-vector
approach with the existing parent-join approach. I would appreciate any
feedback on the approach, benchmark
vigyasharma opened a new pull request, #14173:
URL: https://github.com/apache/lucene/pull/14173
Another take at #12313
The following PR adds support for _independent_ multi-vectors, i.e.
scenarios where a single document is represented by multiple independent vector
values. The most
18 matches
Mail list logo