Azazel K <[email protected]> wrote: > We have solr cluster with 2 shards running 2 nodes on each shard. > They are beefy physical boxes with index size of 162 GB , RAM of > about 96 GB and around 153M documents.
There is a non-trivial overhead for sharding: Using a single shard increases throughput. Have you tried with 1 shard to see if the latency is acceptable for that? > Two times this week we have seen the thread usage spike from the > usual 1000 to 4000 on all nodes at the same time and bring down > the cluster. First guess: You are updating too frequently and hitting multiple overlapping searchers, deteriorating performance which leads to more overlapping searchers and so on. Try looking in the log: https://wiki.apache.org/solr/FAQ#What_does_.22PERFORMANCE_WARNING:_Overlapping_onDeckSearchers.3DX.22_mean_in_my_logs.3F Anyway, 1000 threads sounds high. How many CPUs are on your machines? 32 on each? That is a total of 128 CPUs for your 4 machines, meaning that each CPU is working on about 10 concurrent requests. They might be competing for resources: Have you tried limiting the amount of concurrent request and using a queue? That might give you better performance (and lower heap requirements a bit). - Toke Eskildsen
