Thanks I’ll try that. Is the Thread Dump view in the Solr Admin panel not 
reliable for diagnosing thread hangs?

On a different note, I am considering introducing a dedicated aggregator to 
avoid using a shard both for search and aggregation, in case there is an issue 
there.

Ronald S. Wood | Senior Software Developer
857-991-7681 (mobile)
 
Smarsh
100 Franklin St. Suite 903 | Boston, MA 02210
1-866-SMARSH-1 | 971-998-9967 (fax)
www.smarsh.com <http://www.smarsh.com/>
 
Immediate customer support:
Call 1-866-762-7741 (x2) or visit www.smarsh.com/support 
<http://www.smarsh.com/support>







On 7/2/15, 3:56 PM, "Ryan, Michael F. (LNG-DAY)" <michael.r...@lexisnexis.com> 
wrote:

>Try running jstack on the aggregator - that will show you where the threads 
>are hanging.
>
>-Michael
>
>-----Original Message-----
>From: Ronald Wood [mailto:rw...@smarsh.com] 
>Sent: Thursday, July 02, 2015 3:37 PM
>To: solr-user@lucene.apache.org
>Subject: Distributed queries hang in a non-SolrCloud environment, Solr 4.10.4
>
>
>We are running into an issue when doing distributed queries on Solr 4.10.4. We 
>do not use SolrCloud but instead keep track of shards that need to be searched 
>based on date ranges.
>
>We have been running distributed queries without incident for several years 
>now, but we only recently upgraded to 4.10.4 from 4.8.1.
>
>The query is relatively simple and involves 4 shards, including the aggregator 
>itself.
>
>For a while the server that is acting as the aggregator for the distributed 
>query handles the requests fine, but after an indefinite amount of usage (in 
>the range of 2-4 hours) it starts hanging on all distributed queries while 
>serving non-distributed versions  (no shards list is included) of the same 
>query quickly (9 ms).
>
>CPU, Heap and System Memory Usage do not seem unusual compared to other 
>servers.
>
>I had initially suspect that distributed searches combined with faceting might 
>be part of the issue, since I had seen some long-running threads that seemed 
>to spend a long time in the FastLRUCache when getting facets for a single 
>field. However, in the latest case of blocked queries, I am not seeing that.
>
>We have two slaves that replicate from a master, and we were saw the issue 
>recur after a while of client usage, ruling out a hardware issue.
>
>Does anyone have any suggestions for potential avenues of attack for getting 
>to the bottom of this? Or are there any known issues that could be implicated 
>in this?
>
>- Ronald S. Wood

Reply via email to