> There is a non-trivial overhead for sharding: Using a single shard increases > throughput. Have you tried with 1 >shard to see if the latency is acceptable > for that?
The nodes were unstable when we had single shard setup. It used to run OOM frequently. Ops team setup a cronjob to clear out memory and increased swap space. But it wasn't stable and still caused random outages. >First guess: You are updating too frequently and hitting multiple overlapping >searchers, deteriorating >performance which leads to more overlapping >searchers and so on. Try looking in the log: >https://wiki.apache.org/solr/FAQ#What_does_.22PERFORMANCE_WARNING:_Overlapping_onDeckSearchers>.3DX.22_mean_in_my_logs.3F You're right. It's a heavy index and heavy read system with 30 secs of soft commit and 10 mins autoCommit. I will take a look at the link you gave. Anyway, 1000 threads sounds high. How many CPUs are on your machines? 32 on each? That is a total of 128 CPUs for your 4 machines, meaning that each CPU is working on about 10 concurrent requests. They might be competing for resources: Have you tried limiting the amount of concurrent request and using a queue? That might give you better performance (and lower heap requirements a bit). There are 16CPU on each node. Requests are live with upstream client impact so they cannot be put in a queue(!!). We are planning to re-build based on data type to reduce the load, but it's still few months away. Also in one of the other threads, Shawn Heisey mentions to increase thread size to 100000 and tweaking process limit settings on OS. I haven't dug deep into it yet and see if it applies to this case, but according to his recommendation our setup seems to be running on minimum settings. The maxThreads parameter in the Jetty config defaults to 200, and it is quite easy to exceed this. In the Jetty that comes packaged with Solr, this setting has been changed to 10000, which effectively removes the limit for a typical Solr install. Because you are running 4.4 and your message indicates you are using "service jetty" commands, chances are that you are NOT using the jetty that came with Solr. The first thing I would try is increasing the maxThreads parameter to 10000. The process limit is increased in /etc/security/limits.conf. Here are the additions that I make to this file on my Solr servers, to increase the limits on the number of processes/threads and open files, both of which default to 1024: solr hard nproc 6144 solr soft nproc 4096 solr hard nofile 65535 solr soft nofile 49151 Let me know what u think. Thanks, A ________________________________________ From: Toke Eskildsen <t...@statsbiblioteket.dk> Sent: Friday, February 26, 2016 11:30 AM To: solr_user lucene_apache Subject: Re: Thread Usage Azazel K <am_tech_mon...@outlook.com> wrote: > We have solr cluster with 2 shards running 2 nodes on each shard. > They are beefy physical boxes with index size of 162 GB , RAM of > about 96 GB and around 153M documents. There is a non-trivial overhead for sharding: Using a single shard increases throughput. Have you tried with 1 shard to see if the latency is acceptable for that? > Two times this week we have seen the thread usage spike from the > usual 1000 to 4000 on all nodes at the same time and bring down > the cluster. First guess: You are updating too frequently and hitting multiple overlapping searchers, deteriorating performance which leads to more overlapping searchers and so on. Try looking in the log: https://wiki.apache.org/solr/FAQ#What_does_.22PERFORMANCE_WARNING:_Overlapping_onDeckSearchers.3DX.22_mean_in_my_logs.3F Anyway, 1000 threads sounds high. How many CPUs are on your machines? 32 on each? That is a total of 128 CPUs for your 4 machines, meaning that each CPU is working on about 10 concurrent requests. They might be competing for resources: Have you tried limiting the amount of concurrent request and using a queue? That might give you better performance (and lower heap requirements a bit). - Toke Eskildsen