alessandrobenedetti commented on PR #12314: URL: https://github.com/apache/lucene/pull/12314#issuecomment-1602090466
> Thinking more on this implementation. It seems like we will need at a minimum a new `NeighborQueue` > > I am not sure the existing one needs to be updated, but we instead should have a `MultiValueNeighborQueue`. > > The reason for this is that not only does this queue contain information about result sets, it keeps track of how many nodes are visited and the TopHits returned utilize that number. Consequently, the visited count should keep track of documents visited, not vectors visited. All these changes indicates a new queueing mechanism for multi-valued vector fields. > > Another thought is that Lucene already has the concept of index `join` values. Effectively creating child document IDs under a single parent. This allows for even greater flexibility by indexing the passage the vector represents, and potentially even greater scoring flexibility. > > The issue I could see happening here is ensuring the topdocs searching has the ability to deduplicate (if desired) based on parent document ID. > > Did you consider utilizing this when digging into this implementation? I think it's a good idea to create a new dedicated MultiValued NeighborQueue, I'll do it when I have time but feel free to do it if you like! In regards to index time join, I am not sure it's relevant here (are you talking about block join?): isn't it a different concept from multivalued? i.e. we have the mechanism in Lucene along multi-valued vectors for pretty much all the field types, haven't we? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org