On 8/4/2015 5:19 AM, David Santamauro wrote: > > I have a question about how the stat 'requests' is calculated. I would > really appreciate it if anyone could shed some light on the figures below. > > Assumptions: > version: 5.2.0 > layout: 8 node solrcloud, no replicas (node71-node78) > collection: col1 > handler: /search > stats request: /col1/admin/mbeans?stats=true&cat=QUERYHANDLER&wt=json' > > I wrote a simple shell script that grabs the requests stats member from > every node. > > After collection reload > node 71 -- requests: 2 > node 72 -- requests: 2 > node 73 -- requests: 2 > node 74 -- requests: 2 > node 75 -- requests: 2 > node 76 -- requests: 2 > node 77 -- requests: 2 > node 78 -- requests: 2 > * I assume these are the auto-warm searches > > > After submitting 1 request (q=*:*) > node 71 -- requests: 4 > node 72 -- requests: 3 > node 73 -- requests: 3 > node 74 -- requests: 3 > node 75 -- requests: 3 > node 76 -- requests: 4 > node 77 -- requests: 3 > node 78 -- requests: 3 > > After resubmitting the same request > node 71 -- requests: 6 > node 72 -- requests: 4 > node 73 -- requests: 4 > node 74 -- requests: 4 > node 75 -- requests: 4 > node 76 -- requests: 5 > node 77 -- requests: 5 > node 78 -- requests: 4 > > If that wasn't strange enough, things get out of control if I add in > facet.pivot parameter(s) > > Fresh after reload (see above, 2 for every node) > > Total after a facet.pivot on two fields > node 71 -- requests: 13 > node 72 -- requests: 15 > node 73 -- requests: 14 > node 74 -- requests: 12 > node 75 -- requests: 14 > node 76 -- requests: 12 > node 77 -- requests: 14 > node 78 -- requests: 12 > > I imagine I'm seeing the internal cross-talk between nodes and if so, > how can one reliably keep stats on the number of "real" requests?
Queries on distributed indexes change from the one request that you make into a request to every shard, to check for relevant documents. If relevant documents are found, a second call to those specific shards is made to retrieve those documents. So if you have 5 shards in your index, there could be up to 11 requests counted for a single query. If all the shards are on separate nodes, then for that 11-request query, one of those nodes would count three requests and the others would count two. I know what I'm going to say next would work on an index that is distributed but *not* SolrCloud, and I think it will work in SolrCloud too. If you add a "shards.qt" parameter to defaults in your main request handler (usually /select) that points at another, identically configured handler (perhaps named "/shards") that is also in solrconfig.xml, then that other handler should receive the distributed requests and the main handler should only count the "real" requests. You would be able to track those numbers separately. Thanks, Shawn