date:20241108

Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]

2024-11-08 Thread via GitHub

vigyasharma commented on PR #13525: URL: https://github.com/apache/lucene/pull/13525#issuecomment-2466073115 I tried to find some blogs and benchmarks on other library implementations. Astra Db, Vespa, faiss and nmslib, all seem to support multi-vectors in some form. From what I can

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-11-08 Thread via GitHub

msokolov commented on PR #13872: URL: https://github.com/apache/lucene/pull/13872#issuecomment-2465450838 Yeah I think I will stop working on this. Where I've ended up is with suspicions about the ScorerSupplier/Scorer setup. Because we allocate Scorers *a lot* we rely on that being a chea

Re: [I] CheckIndex could assert more things about the HNSW graph? [lucene]

2024-11-08 Thread via GitHub

benchaplin commented on issue #13724: URL: https://github.com/apache/lucene/issues/13724#issuecomment-2465431906 I've been interested in working on some HNSW stuff so I took a shot at this. I added only the two checks you mentioned to start, but I'm planning to look through some issues and

[PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2024-11-08 Thread via GitHub

benchaplin opened a new pull request, #13984: URL: https://github.com/apache/lucene/pull/13984 ### Description This is a first pass on #13724 - I've added the following checks: - verify neighbor list is in order - verify no duplicate neighbors The code was adapted from [

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-11-08 Thread via GitHub

ChrisHegarty commented on PR #13872: URL: https://github.com/apache/lucene/pull/13872#issuecomment-2465408658 > [the last commit](https://github.com/apache/lucene/pull/13872/commits/ef13bade9b3ab8eb5de35e7297a9468ec45fc745) demonstrates re-using the resource-intensive bits (RandomVectorScor

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-11-08 Thread via GitHub

msokolov commented on code in PR #13872: URL: https://github.com/apache/lucene/pull/13872#discussion_r1834763594 ## lucene/core/src/java/org/apache/lucene/index/FloatVectorValues.java: ## @@ -118,6 +123,9 @@ public Floats vectors() { public float[] get(int ord) throws

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-11-08 Thread via GitHub

msokolov commented on PR #13872: URL: https://github.com/apache/lucene/pull/13872#issuecomment-2465298380 [the last commit](https://github.com/apache/lucene/pull/13872/commits/ef13bade9b3ab8eb5de35e7297a9468ec45fc745) demonstrates re-using the resource-intensive bits (RandomVectorScorer and

[I] KnnFloatVectorQuery#toString should show the filter [lucene]

2024-11-08 Thread via GitHub

jpountz opened a new issue, #13983: URL: https://github.com/apache/lucene/issues/13983 ### Description I was looking at the `toString()` of a `KnnFloatVectorQuery` and got surprised that it was not pre-filtered, when I later discovered that it actually was, `toString()` was just omit

[PR] Remove duplicate test code [lucene]

2024-11-08 Thread via GitHub

LuXugang opened a new pull request, #13982: URL: https://github.com/apache/lucene/pull/13982 ### Description The only diff between `doIterate1 ` and `doIterate2 ` is `bb < b.length()` in `doIterate1 `, but this condition seems always true, so should we remove it? -- This is an auto

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-11-08 Thread via GitHub

ChrisHegarty commented on PR #13872: URL: https://github.com/apache/lucene/pull/13872#issuecomment-2465028558 >As a side note, I see that IndexInput is Closeable -- but I don't think we ever close() the ones we allocate? Is this a problem? Not a problem per se. The creator of the scor

Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]

2024-11-08 Thread via GitHub

krickert commented on PR #13525: URL: https://github.com/apache/lucene/pull/13525#issuecomment-2464836106 I would love to see a single knn field that supports multiple vectors. Right now I feel like doing the embedded docs or a child collection to handle these use cases feel a little too

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-11-08 Thread via GitHub

msokolov commented on PR #13872: URL: https://github.com/apache/lucene/pull/13872#issuecomment-2464749787 OK; re allocating vectors in supplier, yes that can't work because of aliasing across threads. I also found we do not adequately test this case. There are *many* things in here that ar

Re: [PR] Add a Better Binary Quantizer (RaBitQ) format for dense vectors [lucene]

2024-11-08 Thread via GitHub

benwtrent commented on PR #13651: URL: https://github.com/apache/lucene/pull/13651#issuecomment-2464679186 @ShashwatShivam I don't think there is a "memory column" provided anywhere. I simply looked at the individual file sizes (veb, vex) and summed their sizes together. -- This is an au

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-11-08 Thread via GitHub

ChrisHegarty commented on PR #13872: URL: https://github.com/apache/lucene/pull/13872#issuecomment-2464638368 I am now completely confused by this change. And I think that my last commit broke the threading model - even though all tests pass!. I'm not saying that we should not do some refac

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-11-08 Thread via GitHub

ChrisHegarty commented on PR #13872: URL: https://github.com/apache/lucene/pull/13872#issuecomment-2464599913 Looking again in a bit more detail, I dislike that it is not possible to get access to the vector values without *always* creating at least one slice. Previously this was not necess

[PR] Change vector input from IndexInput to RandomAccessInput [lucene]

2024-11-08 Thread via GitHub

dungba88 opened a new pull request, #13981: URL: https://github.com/apache/lucene/pull/13981 ### Description This PR changes the input of vector reader from IndexInput to RandomAccessInput, as we mostly random-access the vectors with seek() followed by readFloats(). One thing

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-11-08 Thread via GitHub

ChrisHegarty commented on PR #13872: URL: https://github.com/apache/lucene/pull/13872#issuecomment-2464539084 @msokolov ok, this is that I was thinking - [f1e0007](https://github.com/apache/lucene/pull/13872/commits/f1e0007b15245db3a048ebf5bf340e70929f21b7). There might be other places too,

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-11-08 Thread via GitHub

ChrisHegarty commented on PR #13872: URL: https://github.com/apache/lucene/pull/13872#issuecomment-2464371689 The threading and reuse model in the current API is subtle, but works well, as it mostly minimises the potential for garbage. Looking at this again, it's almost like anywhere

Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

Re: [I] CheckIndex could assert more things about the HNSW graph? [lucene]

[PR] Add some basic HNSW graph checks to CheckIndex [lucene]

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

[I] KnnFloatVectorQuery#toString should show the filter [lucene]

[PR] Remove duplicate test code [lucene]

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

Re: [PR] Add a Better Binary Quantizer (RaBitQ) format for dense vectors [lucene]

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

[PR] Change vector input from IndexInput to RandomAccessInput [lucene]

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

18 matches

Site Navigation

Mail list logo

Footer information