[ 
https://issues.apache.org/jira/browse/LUCENE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073151#comment-17073151
 ] 

Ishan Chattopadhyaya commented on LUCENE-9302:
----------------------------------------------

Thanks David, moved to a Lucene issue now. Would like some early feedback on 
the patch here. Mainly curious whether there's some reason why the 
totalHitCounts is an integer, not a long? Given that the merge() method's 
javadocs say the following, I feel the total hits across all shards can 
legitimately overflow the integer range (this is happening in our production 
cluster).

 
{code:java}
/** Merges an array of TopGroups, for example obtained
 *  from the second-pass collector across multiple
 *  shards.  Each TopGroups must have been sorted by the
 *  same groupSort and docSort, and the top groups passed
 *  to all second-pass collectors must be the same.
 *
 * <b>NOTE</b>: We can't always compute an exact totalGroupCount.
 * Documents belonging to a group may occur on more than
 * one shard and thus the merged totalGroupCount can be
 * higher than the actual totalGroupCount. In this case the
 * totalGroupCount represents a upper bound. If the documents
 * of one group do only reside in one shard then the
 * totalGroupCount is exact.
 *
 * <b>NOTE</b>: the topDocs in each GroupDocs is actually
 * an instance of TopDocsAndShards
 */
public static <T> TopGroups<T> merge(TopGroups<T>[] shardGroups, Sort 
groupSort, Sort docSort, int docOffset, int docTopN, ScoreMergeMode 
scoreMergeMode) {
 {code}

> Integer overflow in total count in grouping results
> ---------------------------------------------------
>
>                 Key: LUCENE-9302
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9302
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Ian
>            Assignee: Ishan Chattopadhyaya
>            Priority: Minor
>         Attachments: SOLR-13004.patch, SOLR-13004.patch
>
>
> When doing a Grouping search in solr cloud you can get a negative number for 
> the total found.
> This is caused by the accumulated total being held in an integer and not a 
> long.
>  
> example result:
> {{{ "responseHeader": { "status": 0, "QTime": 9231, "params": { "q": 
> "decade:200", "indent": "true", "fl": "decade", "wt": "json", "group.field": 
> "decade", "group": "true", "_": "1542773674247" } }, "grouped": { "decade": { 
> "matches": -629516788, "groups": [ { "groupValue": "200", "doclist": { 
> "numFound": -629516788, "start": 0, "maxScore": 1.9315376, "docs": [ { 
> "decade": "200" } ] } } ] } } }}}
>  
> {{result without grouping:}}
> {{{ "responseHeader": { "status": 0, "QTime": 1063, "params": { "q": 
> "decade:200", "indent": "true", "fl": "decade", "wt": "json", "_": 
> "1542773791855" } }, "response": { "numFound": 3665450508, "start": 0, 
> "maxScore": 1.9315376, "docs": [ { "decade": "200" }, { "decade": "200" }, { 
> "decade": "200" }, { "decade": "200" }, { "decade": "200" }, { "decade": 
> "200" }, { "decade": "200" }, { "decade": "200" }, { "decade": "200" }, { 
> "decade": "200" } ] } }}}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to