On 2/14/2014 2:45 AM, Jared Rodriguez wrote:
> Thanks for the info, I will look into the open file count and try to
> provide more info on how this is occurring.
> 
> Just to make sure that our scenarios were the same, in your tests did you
> simulate many concurrent inbound connections to your web app, with each
> connection sharing the same instance of HttpSolrServer for queries?

I've bumped the max open file limit (in /etc/security/limits.conf on
CentOS) to a soft/hard limit of 49151/65535.  I've also bumped the
process limits to 4096/6144.  These are specific to the user that runs
Solr and other related programs.

My SolrJ program is not actually a web application.  It is my indexing
application, a standalone java program.  We do use SolrJ in our web
application, but that's handled by someone else.  I do know that it uses
a single HttpSolrServer instance across the entire webapp.

When this specific copy of the indexing application (for my dev server)
starts up, it creates 15 HttpSolrServer instances that are used for the
life of the application.  The application will run for weeks or months
at a time and has never had a problem with leaks.

One of these instances points at the /solr URL, which I use for
CoreAdminRequest queries.  Each of the other 14 point at one of the Solr
cores.  My production copy, which has a config file to update two copies
of the index on four servers, creates 32 instances -- four of them for
CoreAdmin requests and 28 of them for cores.

Updates are run once a minute.  One cycle will typically involve several
Solr requests.  Sometimes they are queries, but most of the time they
are update requests.

The application uses database connection pooling (Apache Commons code)
to talk to a MySQL server, pulls in data for indexing, and then sends
requests to Solr.  Most of the time, it only goes to one HttpSolrServer
instance, the core where all new data lives.  Occasionally it will talk
to up to seven of the 14 HttpSolrServer instances -- the ones pointing
at the "live" cores.

When a full rebuild is underway, it starts the dataimport handler on the
seven build cores.  As part of the once-a-minute update cycle, it also
gathers status information on those dataimports.  When the rebuild
finishes, it runs an update on those cores and then does CoreAdmin SWAP
requests to switch to the new index.

I did run a rebuild, and I let the normal indexing run for a really long
time, so I could be sure that it was using all HttpSolrServer instances.
 It never had more than a few dozen connections listed in the netstat
output.

Thanks,
Shawn

Reply via email to