Re: Does Solr fork child processes and result in zombies?

Shawn Heisey Thu, 26 Dec 2013 11:13:13 -0800

On 12/26/2013 9:56 AM, Sir Gilligan wrote:
> I have three CentOS machines running Solr 4.6.0 cloud without any
> replication. That is, numshards is 3 and there is only one Solr instance
> running on each of the boxes.
> 
> Also, on the boxes I arm running ZooKeeper. This is a test environment
> and I would not normally run ZooKeeper on the same boxes.
> 
> As I am inserting data into Solr the boxes get in a weird state. I will
> log in and enter my username and password and then nothing, it just sits
> there. I am connected through Putty. Never gets to a command prompt. I
> stop the data import and after a while I can log in.
> 
> I do the following command on one of the boxes and I see this:
> 
>     ps -lf -C java


<snip>

> How did I end up with two child processes of Solr running? Notice they
> are two PIDS, 7879 and 7949, that are children of 5009. The exact same
> command as well, with all of the parameters I used to launch Solr.
> 
> I also notice the "F" state is "1" for those two processes, so I assume
> that means "forked but didn't exec".
> 
> Also the WCHAN is sched_ on both of them.
> 
> The "S" state is "D" which means uninterruptible sleep ( usually IO ).
> 
> Where are these processes coming from? Do I have something configured
> incorrectly?

Solr itself should not fork processes, or at least I have never seen it
do so.  It does appear that you are using 'start.jar' which suggests
that you're using the Jetty that comes bundled with Solr, although I
cannot tell that for sure.  If you are using some other container
(including another version/copy of Jetty), then I have no idea what it
might do.

I ran the same ps command on one of my CentOS 6 SolrCloud (4.2.1)
machines and I get exactly two entries - one for zookeeper and one for
Solr (running the included Jetty).  If on the other hand I run a ps
command that shows threads, I see a LOT of entries for both zookeeper
and java, because these are highly threaded applications.  I have a much
larger Solr install that's not using SolrCloud, and I have never seen it
fork processes either.  My dev install (running 4.6.0 in non-cloud mode)
also doesn't fork processes.

Side notes:

As long as the machine has enough resources available, running zookeeper
on the same boxes as Solr shouldn't pose a problem.  If the machine
becomes heavily I/O bound and zookeeper data is not on separate
spindles, it might be a problem.

The bootstrap options are not meant to run on every startup.  They
should not be used except when first converting a non-cloud install to a
cloud install.  If you want to upload a new configuration to zookeeper,
you can use the zkCli script in cloud-scripts and then reload your
collection.  Also, I think it's generally not a good idea to use the
numShards startup parameter.  You can indicate the number of shards for
a collection when you create the collection.

With a 12GB heap, you're definitely going to want to tune your garbage
collection.  I don't see an tuning parameters on your commandline.  I'd
like to avoid a religious garbage collection flame-war, so I will give
you the settings that work for me and allow you to decide for yourself
what to do:

http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

Here's some more generic information about performance problems with Solr:

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn

Re: Does Solr fork child processes and result in zombies?

Reply via email to