vigyasharma commented on PR #14173: URL: https://github.com/apache/lucene/pull/14173#issuecomment-2744562872
Thanks for looking into this PR @alessandrobenedetti , this is the latest iteration on multi-vector support. It does build on the same central idea of assigning a unique ordinal to each vector and mapping multiple ordinals to a single doc. I tried a few other approaches, but this one seemed cleanest. I think the key difference over #12314 , are changes to store metadata that lets us map multiple ordinals to a single doc. This is implemented in `MultiVectorOrdConfiguration` using `DirectMonotonicWriter/Reader`. For every doc, I maintain the ordinal of its first vector (`baseOrdinal`) along with no. of vectors in the doc, and use these to do the `ordToDoc` mapping for vectors. I didn't fully understand how this was done in your orginal PR, specifically how it mapped an ordinal back to its docId, given we can have variable no. of vectors per doc. Maybe I missed something. If you had a simpler implementation, I'm happy to circle back to it. I also added an `allVectorValues()` API to `Byte|FloatVectorValues`, which I think will help during query time. Other that this, the changes are mostly around integrating multi-vector support and will likely have a lot of overlap. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org