Jinny-Wang opened a new issue, #14565: URL: https://github.com/apache/lucene/issues/14565
### Description ### Problem The [ToChildBlockJoinQuery](https://github.com/apache/lucene/blob/main/lucene/join/src/java/org/apache/lucene/search/join/ToChildBlockJoinQuery.java) only supports a parentFilter and will return all the children for the matched parent documents. We will need to pair it with another childQuery in order to add filters on the child documents. The current logic to fetch child docs in ToChildBlockJoinQuery `childDoc = 1 + parentBits.prevSetBit(parentDoc - 1);` can be enhanced to getting childDoc from a childIterator. Furthermore, there is no child docs limit per parent, which could lead to over-fetching child docs from one parent doc if the distribution of number of child docs per parent is highly skewed. For example, if we are indexing parent documents and child documents as follow **Block1** - Child1 : `title:bread` - Child2 : `title:flour` - Child3 : `title:milk` - Parent1 **Block2** - Child4 : `title:flower` - Child5 : `title:milk` - Parent2 And we’d like to fetch all child with `title:milk`, but limit to only considering **1 child doc per parent** to ensure that docs from more parents are considered. This should return child3 and child5. But currently there is no way to achieve the correct match. We can use a boolean query to combine a ToChildBlockJoinQuery and a TermQuery ``` ToChildBlockJoinQuery parentJoinQuery = new ToChildBlockJoinQuery(parentQuery, parentsFilter); TermQuery childQuery = new TermQuery(new Term(“title”, “milk”)) BooleanQuery.Builder fullChildQuery = new BooleanQuery.Builder(); fullChildQuery.add(new BooleanClause(parentJoinQuery, Occur.MUST)); fullChildQuery.add(new BooleanClause(childQuery.build(), Occur.MUST)); ``` The childLimitPerParent can be added to [ToChildBlockJoinQuery](https://github.com/apache/lucene/blob/main/lucene/join/src/java/org/apache/lucene/search/join/ToChildBlockJoinQuery.java). As a result the ToChildBlockJoinQuery will return **child1** and **child4**. However ToChildBlockJoinQuery is not aware of the match in childQuery which will match **child3** and **child5**. As a result, the boolean query will return no match. Even though child3 and child5 is a match in this case. ### Solution We propose the following 2 enhancement to ToChildBlockJoinQuery - Add childLimitPerParent to ToChildBlockJoinQuery - Add a childQuery to ToChildBlockJoinQuery Alternatively, we can also introduce a new query operator (The name ParentChildrenBlockJoinQuery is already taken for other purposes though). However, the functionality is largely an enhancement on the existing ToChildBlockJoinQuery. It would make more sense to introduce the 2 enhancement as part of ToChildBlockJoinQuery while supporting the current API of matching all child docs without any limit. Any ideas/suggestions/feedbacks? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org