vigyasharma commented on PR #13525: URL: https://github.com/apache/lucene/pull/13525#issuecomment-2439876776
Thanks @benwtrent. I've been working on getting a multi-vector benchmark running to wire this end to end. Found some pesky bugs and oversights. I'm planning to split this feature into multiple smaller PRs. This PR was mainly to get inputs on the approach. It's too big to test and review. I'll share a plan of the split PRs soon. re: the multi-vector benchmark for passage search use-case, I've been stuck on a bug where after I run into an `EOFException` on reading the last multi-vector document through `DenseOffHeapMultiVectorValues`. I could definitely use some help here. If you plan to take a look, you can use the code in this PR (i'll push my fixes) and multi-vector benchmark code from [here](https://github.com/vigyasharma/luceneutil/tree/multivec). ```java Exception in thread "main" java.lang.RuntimeException: java.io.EOFException: read past EOF: MemorySegmentIndexInput(path="/Users/vigyas/forks/bench/util/knnIndices/cohere-wikipedia-docs-768d.vec-32-50-multiVector.index/_0_Lucene99HnswMultiVectorsFormat_0.vecmv") [slice=multi-vector-data] at knn.KnnGraphTester$ComputeBaselineNNFloatTask.call(KnnGraphTester.java:1115) at knn.KnnGraphTester.computeNN(KnnGraphTester.java:967) at knn.KnnGraphTester.getNN(KnnGraphTester.java:812) at knn.KnnGraphTester.run(KnnGraphTester.java:438) at knn.KnnGraphTester.runWithCleanUp(KnnGraphTester.java:177) at knn.KnnGraphTester.main(KnnGraphTester.java:172) Caused by: java.io.EOFException: read past EOF: MemorySegmentIndexInput(path="/Users/vigyas/forks/bench/util/knnIndices/cohere-wikipedia-docs-768d.vec-32-50-multiVector.index/_0_Lucene99HnswMultiVectorsFormat_0.vecmv") [slice=multi-vector-data] at org.apache.lucene.store.MemorySegmentIndexInput.readByte(MemorySegmentIndexInput.java:146) at org.apache.lucene.store.DataInput.readInt(DataInput.java:95) at org.apache.lucene.store.MemorySegmentIndexInput.readInt(MemorySegmentIndexInput.java:261) at org.apache.lucene.store.DataInput.readFloats(DataInput.java:202) at org.apache.lucene.store.MemorySegmentIndexInput.readFloats(MemorySegmentIndexInput.java:231) at org.apache.lucene.codecs.lucene99.OffHeapFloatMultiVectorValues.vectorValue(OffHeapFloatMultiVectorValues.java:111) at org.apache.lucene.codecs.lucene99.OffHeapFloatMultiVectorValues.vectorValue(OffHeapFloatMultiVectorValues.java:130) at org.apache.lucene.codecs.hnsw.DefaultFlatMultiVectorScorer$FloatMultiVectorScorer.score(DefaultFlatMultiVectorScorer.java:185) at org.apache.lucene.codecs.lucene99.OffHeapFloatMultiVectorValues$DenseOffHeapMultiVectorValues$1.score(OffHeapFloatMultiVectorValues.java:248) at org.apache.lucene.search.AbstractKnnVectorQuery.exactSearch(AbstractKnnVectorQuery.java:220) at knn.KnnFloatVectorBenchmarkQuery.exactSearch(KnnFloatVectorBenchmarkQuery.java:33) at knn.KnnFloatVectorBenchmarkQuery.runExactSearch(KnnFloatVectorBenchmarkQuery.java:50) at knn.KnnGraphTester$ComputeBaselineNNFloatTask.call(KnnGraphTester.java:1111) ... 5 more ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org