[ https://issues.apache.org/jira/browse/LUCENE-8996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953080#comment-16953080 ]
Christine Poerschke commented on LUCENE-8996: --------------------------------------------- Looking at the {{TopGroupsTest}} portion of both the patch and the pull request for this ticket I had some "there's a lot of numbers here" thoughts and it (subjectively, of course) seemed to me a little tricky to work out what they all are (numbers for shard index, numbers for doc id, numbers for group value, numbers for scores, numbers for hit counts, sometimes NaN not-a-number numbers) and what they mean and why/that the expected test results are correct. The LUCENE-9010 sub-task proposes to split out the addition of test coverage for the existing code from the 'maxScore missing' fix here (and the first proposed patch for it tries to reduce the "amount of numbers" e.g. instead of integer group values 1 and 2 there's string group values "red" and "blue" and a narrative and local variable names (redAntScore, blueDragonflyScore, redSquirrelScore, blueWhaleScore) try to make it easier to work out what the {{expectedMaxScore}} value is. > maxScore is sometimes missing from distributed grouped responses > ---------------------------------------------------------------- > > Key: LUCENE-8996 > URL: https://issues.apache.org/jira/browse/LUCENE-8996 > Project: Lucene - Core > Issue Type: Bug > Affects Versions: 5.3 > Reporter: Julien Massenet > Priority: Minor > Attachments: LUCENE-8996.patch, lucene_6_5-GroupingMaxScore.patch, > lucene_solr_5_3-GroupingMaxScore.patch, master-GroupingMaxScore.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This issue occurs when using the grouping feature in distributed mode and > sorting by score. > Each group's {{docList}} in the response is supposed to contain a > {{maxScore}} entry that hold the maximum score for that group. Using the > current releases, it sometimes happens that this piece of information is not > included: > {code} > { > "responseHeader": { > "status": 0, > "QTime": 42, > "params": { > "sort": "score desc", > "fl": "id,score", > "q": "_text_:\"72\"", > "group.limit": "2", > "group.field": "group2", > "group.sort": "score desc", > "group": "true", > "wt": "json", > "fq": "group2:72 OR group2:45" > } > }, > "grouped": { > "group2": { > "matches": 567, > "groups": [ > { > "groupValue": 72, > "doclist": { > "numFound": 562, > "start": 0, > "maxScore": 2.0378063, > "docs": [ > { > "id": "29!26551", > "score": 2.0378063 > }, > { > "id": "78!11462", > "score": 2.0298104 > } > ] > } > }, > { > "groupValue": 45, > "doclist": { > "numFound": 5, > "start": 0, > "docs": [ > { > "id": "72!8569", > "score": 1.8988966 > }, > { > "id": "72!14075", > "score": 1.5191172 > } > ] > } > } > ] > } > } > } > {code} > Looking into the issue, it comes from the fact that if a shard does not > contain a document from that group, trying to merge its {{maxScore}} with > real {{maxScore}} entries from other shards is invalid (it results in NaN). > I'm attaching a patch containing a fix. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org