> There is a non-trivial overhead for sharding: Using a single shard increases
> throughput. Have you tried with 1 >shard to see if the latency is acceptable
> for that?
The nodes were unstable when we had single shard setup. It used to run OOM
frequently. Ops team setup a cronjob to clear out memory and increased swap
space. But it wasn't stable and still caused random outages.
>First guess: You are updating too frequently and hitting multiple overlapping
>searchers, deteriorating >performance which leads to more overlapping
>searchers and so on. Try looking in the log:
>https://wiki.apache.org/solr/FAQ#What_does_.22PERFORMANCE_WARNING:_Overlapping_onDeckSearchers>.3DX.22_mean_in_my_logs.3F
You're right. It's a heavy index and heavy read system with 30 secs of soft
commit and 10 mins autoCommit. I will take a look at the link you gave.
Anyway, 1000 threads sounds high. How many CPUs are on your machines? 32 on
each? That is a total of 128 CPUs for your 4 machines, meaning that each CPU is
working on about 10 concurrent requests. They might be competing for resources:
Have you tried limiting the amount of concurrent request and using a queue?
That might give you better performance (and lower heap requirements a bit).
There are 16CPU on each node. Requests are live with upstream client impact
so they cannot be put in a queue(!!). We are planning to re-build based on data
type to reduce the load, but it's still few months away.
Also in one of the other threads, Shawn Heisey mentions to increase thread size
to 10 and tweaking process limit settings on OS. I haven't dug deep into
it yet and see if it applies to this case, but according to his recommendation
our setup seems to be running on minimum settings.
The maxThreads parameter in the Jetty config defaults to 200, and it is
quite easy to exceed this. In the Jetty that comes packaged with Solr,
this setting has been changed to 1, which effectively removes the
limit for a typical Solr install. Because you are running 4.4 and your
message indicates you are using "service jetty" commands, chances are
that you are NOT using the jetty that came with Solr. The first thing I
would try is increasing the maxThreads parameter to 1.
The process limit is increased in /etc/security/limits.conf. Here are
the additions that I make to this file on my Solr servers, to increase
the limits on the number of processes/threads and open files, both of
which default to 1024:
solrhardnproc 6144
solrsoftnproc 4096
solrhardnofile 65535
solrsoftnofile 49151
Let me know what u think.
Thanks,
A
From: Toke Eskildsen
Sent: Friday, February 26, 2016 11:30 AM
To: solr_user lucene_apache
Subject: Re: Thread Usage
Azazel K wrote:
> We have solr cluster with 2 shards running 2 nodes on each shard.
> They are beefy physical boxes with index size of 162 GB , RAM of
> about 96 GB and around 153M documents.
There is a non-trivial overhead for sharding: Using a single shard increases
throughput. Have you tried with 1 shard to see if the latency is acceptable for
that?
> Two times this week we have seen the thread usage spike from the
> usual 1000 to 4000 on all nodes at the same time and bring down
> the cluster.
First guess: You are updating too frequently and hitting multiple overlapping
searchers, deteriorating performance which leads to more overlapping searchers
and so on. Try looking in the log:
https://wiki.apache.org/solr/FAQ#What_does_.22PERFORMANCE_WARNING:_Overlapping_onDeckSearchers.3DX.22_mean_in_my_logs.3F
Anyway, 1000 threads sounds high. How many CPUs are on your machines? 32 on
each? That is a total of 128 CPUs for your 4 machines, meaning that each CPU is
working on about 10 concurrent requests. They might be competing for resources:
Have you tried limiting the amount of concurrent request and using a queue?
That might give you better performance (and lower heap requirements a bit).
- Toke Eskildsen