We are running into an issue when doing distributed queries on Solr 4.10.4. We 
do not use SolrCloud but instead keep track of shards that need to be searched 
based on date ranges.

We have been running distributed queries without incident for several years 
now, but we only recently upgraded to 4.10.4 from 4.8.1.

The query is relatively simple and involves 4 shards, including the aggregator 
itself.

For a while the server that is acting as the aggregator for the distributed 
query handles the requests fine, but after an indefinite amount of usage (in 
the range of 2-4 hours) it starts hanging on all distributed queries while 
serving non-distributed versions  (no shards list is included) of the same 
query quickly (9 ms).

CPU, Heap and System Memory Usage do not seem unusual compared to other servers.

I had initially suspect that distributed searches combined with faceting might 
be part of the issue, since I had seen some long-running threads that seemed to 
spend a long time in the FastLRUCache when getting facets for a single field. 
However, in the latest case of blocked queries, I am not seeing that.

We have two slaves that replicate from a master, and we were saw the issue 
recur after a while of client usage, ruling out a hardware issue.

Does anyone have any suggestions for potential avenues of attack for getting to 
the bottom of this? Or are there any known issues that could be implicated in 
this?

- Ronald S. Wood

Reply via email to