vigyasharma commented on PR #13991: URL: https://github.com/apache/lucene/pull/13991#issuecomment-2472019578
I am thinking we can leverage the `NONE` aggregation (in #13525) for non-ColBERT passage vector use-cases, by making each graph node correspond to a single value in the multi-vector i.e. index time aggregation becomes "none". The resultant graph would be similar to what we construct with parent-child docs today, while flat storage with multi-vectors could allow for aggregated similarity checks at *query* time. This could help with recall while making mutli-vector usage easier to use (no overquery or index time joins). It'll need some work: a mechanism to address each vector value directly, and corresponding changes in VectorValues. I'm thinking maybe an "ordinal" for the multi-vector, and a "sub-ordinal" for values within the multi-vector. Both ints can be packed into a long for node value? Since I haven't chalked out all the details yet, I decided to **remove** the `NONE` aggregation for now, and focus first on the ColBERT use case. Will go with the `isMultiVector` flag in FieldInfos to identify multi-vector storage requirements. cc: @cpoerschke , @benwtrent -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org