alessandrobenedetti commented on PR #14173: URL: https://github.com/apache/lucene/pull/14173#issuecomment-2751006045
> > do you confirm that, according to your knowledge, any relevant and active work toward multi-valued vectors in Lucene is effectively aggregated here? > > @alessandrobenedetti I think so. This is the latest stab at it. > > > Main concern is still related to ordinals to become long as far as I can see :) > > Indeed, I just don't see how Lucene can actually support multi-value vectors without switching to long ordinals for the vectors. Otherwise, we enforce some limitation on the number of vectors per segment, or some limitation on the number of vectors per doc (e.g. every doc can only have 256/65535 vectors). > > Making HNSW indexing & merging ~2x (given other constants, it might not be exactly 2x, maybe a little less) more expensive for heap usage is a pretty steep cost. Especially for something I am not sure how many folks will actually use. I agree, I don't think it makes sense to deteriorate single-valued performance at all (didn't investigate that, but I trust your judgement in terms of the int->long ordinal impact, in case you want me to double check let me know). Another option I was pondering is adding a new field type dedicated to multi-valued vectors. Sure, there will be tons of classes to "duplicate" and make multi-valued compliant, but I believe we'll be able to re-use most of the code, so a huge number of classes but not a massive new code quantity (hopefully). Before even exploring this, I want to better check the current parent join approach i.e. native multi-valued, needs to bring advantages (mostly being faster in retrieving top-K 'parent' documents), if not, it won't make much sense to do this huge amount of work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org