Thread Usage

2016-02-26 Thread Azazel K
Hi,


We have solr cluster with 2 shards running 2 nodes on each shard.  They are 
beefy physical boxes with index size of 162 GB , RAM of about 96 GB and around 
153M documents.


Two times this week we have seen the thread usage spike from the usual 1000 to 
4000 on all nodes at the same time and bring down the cluster.  We had to 
divert the traffic(search and update), perform a rolling restart each time, and 
put them back in.  Has anyone faced this issue before?  We don't have any other 
process running on the box that could cause such a huge spike in thread usage 
on all nodes at the same time.


Any pointers appreciated.


Thanks

A


Re: Thread Usage

2016-02-26 Thread Azazel K
> There is a non-trivial overhead for sharding: Using a single shard increases 
> throughput. Have you tried with 1 >shard to see if the latency is acceptable 
> for that?

The nodes were unstable when we had single shard setup.  It used to run OOM 
frequently.  Ops team setup a cronjob to clear out memory and increased swap 
space.  But it wasn't stable and still caused random outages.

>First guess: You are updating too frequently and hitting multiple overlapping 
>searchers, deteriorating >performance which leads to more overlapping 
>searchers and so on. Try looking in the log:
>https://wiki.apache.org/solr/FAQ#What_does_.22PERFORMANCE_WARNING:_Overlapping_onDeckSearchers>.3DX.22_mean_in_my_logs.3F

You're right.  It's a heavy index and heavy read system with 30 secs of soft 
commit and 10 mins autoCommit.  I will take a look at the link you gave.

Anyway, 1000 threads sounds high. How many CPUs are on your machines? 32 on 
each? That is a total of 128 CPUs for your 4 machines, meaning that each CPU is 
working on about 10 concurrent requests. They might be competing for resources: 
Have you tried limiting the amount of concurrent request and using a queue? 
That might give you better performance (and lower heap requirements a bit).

There are 16CPU on each node.  Requests are live  with upstream client impact 
so they cannot be put in a queue(!!). We are planning to re-build based on data 
type to reduce the load, but it's still few months away.

Also in one of the other threads, Shawn Heisey mentions to increase thread size 
to 10 and tweaking process limit settings on OS.  I haven't dug deep into 
it yet and see if it applies to this case, but according to his recommendation 
our setup seems to be running on minimum settings.

The maxThreads parameter in the Jetty config defaults to 200, and it is
quite easy to exceed this.  In the Jetty that comes packaged with Solr,
this setting has been changed to 1, which effectively removes the
limit for a typical Solr install.  Because you are running 4.4 and your
message indicates you are using "service jetty" commands, chances are
that you are NOT using the jetty that came with Solr.  The first thing I
would try is increasing the maxThreads parameter to 1.

The process limit is increased in /etc/security/limits.conf.  Here are
the additions that I make to this file on my Solr servers, to increase
the limits on the number of processes/threads and open files, both of
which default to 1024:

solrhardnproc   6144
solrsoftnproc   4096

solrhardnofile  65535
solrsoftnofile  49151

Let me know what u think.

Thanks,
A

From: Toke Eskildsen 
Sent: Friday, February 26, 2016 11:30 AM
To: solr_user lucene_apache
Subject: Re: Thread Usage

Azazel K  wrote:
> We have solr cluster with 2 shards running 2 nodes on each shard.
> They are beefy physical boxes with index size of 162 GB , RAM of
> about 96 GB and around 153M documents.

There is a non-trivial overhead for sharding: Using a single shard increases 
throughput. Have you tried with 1 shard to see if the latency is acceptable for 
that?

> Two times this week we have seen the thread usage spike from the
> usual 1000 to 4000 on all nodes at the same time and bring down
> the cluster.

First guess: You are updating too frequently and hitting multiple overlapping 
searchers, deteriorating performance which leads to more overlapping searchers 
and so on. Try looking in the log:
https://wiki.apache.org/solr/FAQ#What_does_.22PERFORMANCE_WARNING:_Overlapping_onDeckSearchers.3DX.22_mean_in_my_logs.3F


Anyway, 1000 threads sounds high. How many CPUs are on your machines? 32 on 
each? That is a total of 128 CPUs for your 4 machines, meaning that each CPU is 
working on about 10 concurrent requests. They might be competing for resources: 
Have you tried limiting the amount of concurrent request and using a queue? 
That might give you better performance (and lower heap requirements a bit).

- Toke Eskildsen


Reindexing

2015-08-19 Thread Azazel K
Hi,
We have an over engineered index that we would be to rework.  It's already 
holding 150M documents with 94GB of index size.  We have High index/high query 
system running Solr 4.5.
My question -  If we update the schema, can we run reindex by using "Reload" 
action in CoreAdmin UI?  Will that regenerate the index according to schema 
updates?
Thanks,Az
  

Solr Split

2017-03-17 Thread Azazel K
Hi,


We have a solr index running in 4.5.0 that we are trying to upgrade to 4.7.2 
and split the shard.


The uniqueKey is a TrieLongField, and it's values are always negative :


In prod (2 shards, 1 replica for each shard)

Max : -9223372035490849922
Min : -9223372036854609508


In lab (1 shard, 1 replica): Negatives between ( -21339 to -9223372036854687955 
) and couple of documents with positive values.


To test out if it will work, I executed following steps in lab on new instances 
containing solr/zk cluster


1. From the old cluster(C1), zip up the index while server is not running.  We 
are not going to the source to re-index into new cluster(C2) as we don't own 
the data(yes. that's being addressed).

2. On the new cluster(C2), create a new collection.

2. Stop tomcat server in new cluster(C2).

3. Overwrite new cluster index(C2) with the original cluster's index(C1).

4. Start the server and optimize(C2).

5.  Num of docs match with the original cluster and everything seems to work. 
Total documents: 4021887(C2 and C1).


then


1. Split the shard in new cluster(C2).  For 1.5 GB it takes around 6 minutes to 
complete.

2. Two shards are created, with hash range(0-7fff and 8000-). 
Original range was 8000-.

Num docs for hash range "0-7fff" is 4021886, and for " 8000-" 
is "3680519"


Apparently, both shards contain lot of duplicate documents.  It's not properly 
sharded at all.  I tried this twice with the same output.  What might be the 
issue here?


Any pointers really appreciated.


Azazel


Solr Split not working

2017-03-20 Thread Azazel K
Hi, We have a solr index running in 4.5.0 that we are trying to upgrade to 
4.7.2 and split the shard.

The uniqueKey is a TrieLongField, and it's values are always negative :

Max : -9223372035490849922
Min : -9223372036854609508


When we copy the solr 4.5.0 index to new cluster running 4.7.2 and split the 
index, the data is duplicated in both new shards.


Any ideas why?