msokolov commented on PR #12434: URL: https://github.com/apache/lucene/pull/12434#issuecomment-1656953197
> The main issue is that it won't return the correct number of parent documents when the user requests the top-k parents based on their children vectors. If there are multiple children per parent, this approach may return fewer than k parent documents. Thanks I see now. So this is kind of similar in spirit to the existing problem where we want the top K documents (by vector distance) satisfying some constraints and we have to choose whether to find some higher number of nearest docs (K') in the hopes that at least K of them will satisfy the constraints (post-filtering), or whether to apply the filters while searching, guaranteeing top K. I just want to note that both approaches have merit; it's a tradeoff depending on how restrictive the filters are, but for not-very-restrictive filters, post-filtering can outperform. In this case I guess there is a similar tradeoff relating to how many child documents there typically are. If it's a small number (say c children per parent), it may be better to use KNN search with K' = c * K. It would be interesting to compare these two approaches to see if we can provide some guidance or even some kind of api that chooses? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org