Hi guys, this is the scenario we are studying : Solr 4.10.2 16 shards, a solr instance aggregating the results running a distrib query with shards=..... ( all the shards) .
Currently we are not using shards.tolerant=true, so we throw an exception on error. We are in a situation when a shard is too slow to respond ( empty filter cache, big load). According to the timeout that the shard handler is expecting that shard is not fast enough, and for this reason we whole request fails. So far, everything is clear. We need to improve the speed of the shards, managing properly the auto warming , load balancing etc . We can play with the tolerant factor, and possibly be tolerant of errors. But what happens is that the solr aggregator which runs the queries against the shards is exhausting his threads... Looking into the code, in the case we are not tolerant we get this : // Was there an exception? > if (srsp.getException() != null) { > // If things are not tolerant, abort everything and rethrow > if(!tolerant) { > * shardHandler1.cancelAll();* > if (srsp.getException() instanceof SolrException) { > throw (SolrException)srsp.getException(); > } else { > throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, > srsp.getException()); > } I would assume that is the responsible of the thread cleaning. Any idea why the thread cleaning should not happen properly? Can be some jetty misconfiguration ? Cheers -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England