[ https://issues.apache.org/jira/browse/LUCENE-8996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953078#comment-16953078 ]
Christine Poerschke commented on LUCENE-8996: --------------------------------------------- {quote}... If you merge two groups with no real maxScores the final result will be MIN_VALUE (NaN would make more sense imo) ... {quote} Yes, MIN_VALUE seems a quirky result for this edge case. Though if one were to change the existing behaviour it might be clearest to do that separately from the 'maxScore missing' fix here: here we are removing an erroneous case of 'maxScore missing' and changing away from MIN_VALUE would add a legitimate case of 'maxScore missing'. {quote}... this *should* never happen in theory because if no segment contains documents about group x it shouldn't be possible that we retrieve documents about group x in first place. ... {quote} I agree, in theory it should never happen though in practice I think there's a timing window of opportunity that could make it happen, though it would seem quite unlikely. The first pass of the distributed search could determine that there are segments with documents about group X but subsequently it could then be 'just so' that by the time the second pass of the search runs a few moments later the document(s) in group X have all been deleted? > maxScore is sometimes missing from distributed grouped responses > ---------------------------------------------------------------- > > Key: LUCENE-8996 > URL: https://issues.apache.org/jira/browse/LUCENE-8996 > Project: Lucene - Core > Issue Type: Bug > Affects Versions: 5.3 > Reporter: Julien Massenet > Priority: Minor > Attachments: LUCENE-8996.patch, lucene_6_5-GroupingMaxScore.patch, > lucene_solr_5_3-GroupingMaxScore.patch, master-GroupingMaxScore.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This issue occurs when using the grouping feature in distributed mode and > sorting by score. > Each group's {{docList}} in the response is supposed to contain a > {{maxScore}} entry that hold the maximum score for that group. Using the > current releases, it sometimes happens that this piece of information is not > included: > {code} > { > "responseHeader": { > "status": 0, > "QTime": 42, > "params": { > "sort": "score desc", > "fl": "id,score", > "q": "_text_:\"72\"", > "group.limit": "2", > "group.field": "group2", > "group.sort": "score desc", > "group": "true", > "wt": "json", > "fq": "group2:72 OR group2:45" > } > }, > "grouped": { > "group2": { > "matches": 567, > "groups": [ > { > "groupValue": 72, > "doclist": { > "numFound": 562, > "start": 0, > "maxScore": 2.0378063, > "docs": [ > { > "id": "29!26551", > "score": 2.0378063 > }, > { > "id": "78!11462", > "score": 2.0298104 > } > ] > } > }, > { > "groupValue": 45, > "doclist": { > "numFound": 5, > "start": 0, > "docs": [ > { > "id": "72!8569", > "score": 1.8988966 > }, > { > "id": "72!14075", > "score": 1.5191172 > } > ] > } > } > ] > } > } > } > {code} > Looking into the issue, it comes from the fact that if a shard does not > contain a document from that group, trying to merge its {{maxScore}} with > real {{maxScore}} entries from other shards is invalid (it results in NaN). > I'm attaching a patch containing a fix. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org