[ https://issues.apache.org/jira/browse/LUCENE-10061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440873#comment-17440873 ]
Zach Chen edited comment on LUCENE-10061 at 11/9/21, 3:54 AM: -------------------------------------------------------------- {quote}Thanks for exploring this area [~zacharymorn]! {quote} No problem, I'm always interested in exploring and learning about lucene querying! {quote}I wonder if LUCENE-9335 could be helpful to reduce the overhead of pruning, since Maxscore tends to be have lower overhead than WAND. {quote} I think in my current understanding and testing of CombinedFieldQuery, WANDScorer is actually not used there ([it very much doesn't get re-written to BooleanQuery|https://github.com/apache/lucene/blob/ded77d8bfdcdbf7cc2547e67833434a56f2edd16/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L256-L261]). In addition, the PR is already doing Maxscore-like calculation based on competitive impacts to skip docs. Am I missing anything here? {quote}I see that you tested with 4 and 2 as boost values. I wonder if it makes a difference if you try out e.g. 20 and 1 instead. I just looked again at table 3.1 on [https://www.staff.city.ac.uk/~sbrp622/papers/foundations_bm25_review.pdf] and the optimal weights that they found for title/body were 38.4/1 on one dataset and 13.5/1 on another dataset. {quote} Sounds good will give that a try! was (Author: zacharymorn): {quote}Thanks for exploring this area [~zacharymorn]! {quote} No problem, I'm always interested in exploring and learning about lucene querying! {quote}I wonder if LUCENE-9335 could be helpful to reduce the overhead of pruning, since Maxscore tends to be have lower overhead than WAND. {quote} I think in my current understanding and testing of CombinedFieldQuery, WANDScorer is actually not used there ([it doesn't get written to BooleanQuery for most of the time|https://github.com/apache/lucene/blob/ded77d8bfdcdbf7cc2547e67833434a56f2edd16/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L256-L261]). In addition, the PR is already doing Maxscore-like calculation based on competitive impacts to skip docs. Am I missing anything here? {quote}I see that you tested with 4 and 2 as boost values. I wonder if it makes a difference if you try out e.g. 20 and 1 instead. I just looked again at table 3.1 on [https://www.staff.city.ac.uk/~sbrp622/papers/foundations_bm25_review.pdf] and the optimal weights that they found for title/body were 38.4/1 on one dataset and 13.5/1 on another dataset. {quote} Sounds good will give that a try! > CombinedFieldsQuery needs dynamic pruning support > ------------------------------------------------- > > Key: LUCENE-10061 > URL: https://issues.apache.org/jira/browse/LUCENE-10061 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > Attachments: CombinedFieldQueryTasks.wikimedium.10M.nostopwords.tasks > > Time Spent: 50m > Remaining Estimate: 0h > > CombinedFieldQuery's Scorer doesn't implement advanceShallow/getMaxScore, > forcing Lucene to collect all matches in order to figure the top-k hits. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org