Re: Issue serving concurrent requests to SOLR on PROD

Luis Cappa Banda Tue, 19 May 2015 07:37:36 -0700

Hi there,

Unfortunately I don' t agree with Shawn when he suggest to update
server.xml configuration up to 10000 in maxThreads. If Tomcat (due to the
concurrent overload you' re suffering, the type of the queries you' re
handling, etc.) cannot manage the requested queries what could happen is
that Tomcat internal request queue fills and and Out of Memory may appear
to say hello to you.


Solr is multithreaded and Tomcat also it is, but those Tomcat threads are
managed by an internal thread pool with a queue. What Tomcat does is to
dispatch requests as much it cans over the web applications that are
deployed in it (in this case, Solr). If Tomcat receives more requests that
it can answer its internal queue starts to be filled.

Those timeouts from the client side you explained seems to be due to Tomcat
thread pool and its queue is starting to fill up. You can check it
monitoring its memory and thread usage and I' m sure you' ll see how it
grows correlated with the number of concurrent requests they receive. Then,
for sure you' ll se a more or less horizontal line from memory usage and
those timeouts will appear from the cliente side.

Basically I think that our scenarios are:

   - Queries are slow. You should check and try to improve them, because
   maybe they are bad formed and that queries are destroying your performance.
   Also, check your index configuration (segments number, etc.).
   - Queries are OK, but you receive more queries that you can handle. Your
   configuration and everything is well done, but you are trying to consume
   more requests that you can dispatch and answer.

If you cannot improve your queries, or your queries are OK but you receive
more requests that the ones you can handle, the only solution you have is
to scale horizontally and startup new Tomcat + Solrs from 4 to N nodes.


Best,


- Luis Cappa

2015-05-19 15:57 GMT+02:00 Michael Della Bitta <mdellabi...@gmail.com>:

> Are you sure the requests are getting queued because the LB is detecting
> that Solr won't handle them?
>
> The reason why I'm asking is I know that ELB doesn't handle bursts well.
> The load balancer needs to "warm up," which essentially means it might be
> underpowered at the beginning of a burst. It will spool up more resources
> if the average load over the last minute is high. But for that minute it
> will definitely not be able to handle a burst.
>
> If you're testing infrastructure using a benchmarking tool that doesn't
> slowly ramp up traffic, you're definitely encountering this problem.
>
> Michael
>
>   Jani, Vrushank <vrushank.j...@truelocal.com.au>
>  2015-05-19 at 03:51
>
> Hello,
>
> We have production SOLR deployed on AWS Cloud. We have currently 4 live
> SOLR servers running on m3xlarge EC2 server instances behind ELB (Elastic
> Load Balancer) on AWS cloud. We run Apache SOLR in Tomcat container which
> is sitting behind Apache httpd. Apache httpd is using prefork mpm and the
> request flows from ELB to Apache Httpd Server to Tomcat (via AJP).
>
> Last few days, we are seeing increase in the requests around 20000
> requests minute hitting the LB. In effect we see ELB Surge Queue Length
> continuously being around 100.
> Surge Queue Length: represents the total number of request pending
> submission to the instances, queued by the load balancer;
>
> This is causing latencies and time outs from Client applications. Our
> first reaction was that we don't have enough max connections set either in
> HTTPD or Tomcat. What we saw, the servers are very lightly loaded with very
> low CPU and memory utilisation. Apache preform settings are as below on
> each servers with keep-alive turned off.
>
> <IfModule prefork.c>
> StartServers 8
> MinSpareServers 5
> MaxSpareServers 20
> ServerLimit 256
> MaxClients 256
> MaxRequestsPerChild 4000
> </IfModule>
>
>
> Tomcat server.xml has following settings.
>
> <Connector port="8080" protocol="AJP/1.3" address="127.0.0.1"
> maxThreads="500" connectionTimeout="60000"/>
> For HTTPD – we see that there are lots of TIME_WAIT connections Apache
> port around 7000+ but ESTABLISHED connections are around 20.
> For Tomact – we see about 60 ESTABLISHED connections on tomcat AJP port.
>
> So the servers and connections doesn't look like fully utilised to the
> capacity. There is no visible stress anywhere. However we still get
> requests being queued up on LB because they can not be served from
> underlying servers.
>
> Can you please help me resolving this issue? Can you see any apparent
> problem here? Am I missing any configuration or settings for SOLR?
>
> Your help will be truly appreciated.
>
> Regards
> VJ
>
>
>
>
>
>
> Vrushank Jani [http://media.for.truelocal.com.au/signature/img/divider.png]
> Senior Java Developer
> T 02 8312 1625[http://media.for.truelocal.com.au/signature/img/divider.png]
> E vrushank.j...@truelocal.com.au<mailto:yourem...@truelocal.com.au>
> <yourem...@truelocal.com.au>
>
> [http://media.for.truelocal.com.au/signature/img/TL_logo.png]
> <http://www.truelocal.com.au/> <http://www.truelocal.com.au/> [
> http://media.for.truelocal.com.au/signature/img/TL_facebook.png]
> <https://www.facebook.com/truelocal> <https://www.facebook.com/truelocal>
> [http://media.for.truelocal.com.au/signature/img/TL_twitter.png]
> <https://www.twitter.com/truelocal> <https://www.twitter.com/truelocal> [
> http://media.for.truelocal.com.au/signature/img/TL_google.png]
> <https://plus.google.com/+truelocal/posts>
> <https://plus.google.com/+truelocal/posts> [
> http://media.for.truelocal.com.au/signature/img/TL_pintrest.png]
> <http://www/pinterest.com/truelocal> <http://www/pinterest.com/truelocal>
>
>
>


-- 
- Luis Cappa

Re: Issue serving concurrent requests to SOLR on PROD

Reply via email to