msokolov commented on PR #12434:
URL: https://github.com/apache/lucene/pull/12434#issuecomment-1656953197

   > The main issue is that it won't return the correct number of parent 
documents when the user requests the top-k parents based on their children 
vectors. If there are multiple children per parent, this approach may return 
fewer than k parent documents.
   
   Thanks I see now. So this is kind of similar in spirit to the existing 
problem where we want the top K documents (by vector distance) satisfying some 
constraints and we have to choose whether to find some higher number of nearest 
docs (K') in the hopes that at least K of them will satisfy the constraints 
(post-filtering), or whether to apply the filters while searching, guaranteeing 
top K. I just want to note that both approaches have merit; it's a tradeoff 
depending on how restrictive the filters are, but for not-very-restrictive 
filters, post-filtering can outperform. In this case I guess there is a similar 
tradeoff relating to how many child documents there typically are. If it's a 
small number (say c children per parent), it may be better to use KNN search 
with K' = c * K.  It would be interesting to compare these two approaches to 
see if we can provide some guidance or even some kind of api that chooses?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to