Jinny-Wang opened a new issue, #14565:
URL: https://github.com/apache/lucene/issues/14565

   ### Description
   
   ### Problem
   
   The 
[ToChildBlockJoinQuery](https://github.com/apache/lucene/blob/main/lucene/join/src/java/org/apache/lucene/search/join/ToChildBlockJoinQuery.java)
 only supports a parentFilter and will return all the children for the matched 
parent documents.  We will need to pair it with another childQuery in order to 
add filters on the child documents. 
   The current logic to fetch child docs in ToChildBlockJoinQuery
    `childDoc = 1 + parentBits.prevSetBit(parentDoc - 1);`
   can be enhanced to getting childDoc from a childIterator.
   
   Furthermore, there is no child docs limit per parent, which could lead to 
over-fetching child docs from one parent doc if the distribution of number of 
child docs per parent is highly skewed.  
   
   
   For example, if we are indexing parent documents and child documents as 
follow 
   
   **Block1**
   - Child1 : `title:bread`
   - Child2 : `title:flour`
   - Child3 : `title:milk`
   - Parent1
   
   **Block2** 
   - Child4 : `title:flower`
   - Child5 : `title:milk`
   - Parent2
   
   And we’d like to fetch all child with `title:milk`, but limit to only 
considering **1 child doc per parent** to ensure that docs from more parents 
are considered.  This should return child3 and child5. But currently there is 
no way to achieve the correct match. 
   
   
   We can use a boolean query to combine a ToChildBlockJoinQuery and a TermQuery
   ```
   ToChildBlockJoinQuery parentJoinQuery = new 
ToChildBlockJoinQuery(parentQuery, parentsFilter);
   TermQuery childQuery = new TermQuery(new Term(“title”, “milk”))
       BooleanQuery.Builder fullChildQuery = new BooleanQuery.Builder();
       fullChildQuery.add(new BooleanClause(parentJoinQuery, Occur.MUST));
       fullChildQuery.add(new BooleanClause(childQuery.build(), Occur.MUST));
   ```
   
    The childLimitPerParent can be added to 
[ToChildBlockJoinQuery](https://github.com/apache/lucene/blob/main/lucene/join/src/java/org/apache/lucene/search/join/ToChildBlockJoinQuery.java).
 As a result the ToChildBlockJoinQuery will return **child1** and **child4**.
   
   However ToChildBlockJoinQuery is not aware of the match in childQuery which 
will match **child3** and **child5**.  As a result, the boolean query will 
return no match. Even though child3 and child5 is a match in this case. 
   
   
   ### Solution 
   We propose the following 2 enhancement to ToChildBlockJoinQuery
   
   - Add childLimitPerParent to ToChildBlockJoinQuery
   - Add a childQuery to ToChildBlockJoinQuery 
   
   Alternatively, we can also introduce a new query operator (The name 
ParentChildrenBlockJoinQuery is already taken for other purposes though). 
However, the functionality is largely an enhancement on the existing 
ToChildBlockJoinQuery. It would make more sense to introduce the 2 enhancement 
as part of ToChildBlockJoinQuery while supporting the current API of matching 
all child docs without any limit. 
   Any ideas/suggestions/feedbacks?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to