Yeah, thats no good. You might hit each node with distrib=false to get the doc counts.
Which ones have what you think are the right counts and which the wrong - eg is it all replicas that are off, or leaders as well? You say several replicas - do you mean no leaders went down? You might look closer at the logs for a node that has it's count off. Finally, I guess I'd try and track it in a JIRA issue. - Mark On Apr 19, 2013, at 6:37 PM, Timothy Potter <thelabd...@gmail.com> wrote: > We had a rogue query take out several replicas in a large 4.2.0 cluster > today, due to OOM's (we use the JVM args to kill the process on OOM). > > After recovering, when I execute the match all docs query (*:*), I get a > different count each time. > > In other words, if I execute q=*:* several times in a row, then I get a > different count back for numDocs. > > This was not the case prior to the failure as that is one thing we monitor > for. > > I think I should be worried ... any ideas on how to troubleshoot this? One > thing to mention is that several of my replicas had to do full recoveries > from the leader when they came back online. Indexing was happening when the > replicas failed. > > Thanks. > Tim