On 3/24/2015 11:22 AM, Ian Rose wrote:
> Let me give a bit of background.  Our Solr cluster is multi-tenant, where
> we use one collection for each of our customers.  In many cases, these
> customers are very tiny, so their collection consists of just a single
> shard on a single Solr node.  In fact, a non-trivial number of them are
> totally empty (e.g. trial customers that never did anything with their
> trial account).  However there are also some customers that are larger,
> requiring their collection to be sharded.  Our strategy is to try to keep
> the total documents in any one shard under 20 million (honestly not sure
> where my coworker got that number from - I am open to alternatives but I
> realize this is heavily app-specific).
>
> So my original question is not related to indexing or query traffic, but
> just the sheer number of cores.  For example, if I have 10 active cores on
> a machine and everything is working fine, should I expect that everything
> will still work fine if I add 10 nearly-idle cores to that machine?  What
> about 100?  1000?  I figure the overhead of each core is probably fairly
> low but at some point starts to matter.

One resource that may be exhausted faster than any other when you have a
lot of cores on a solr instance (especially when they are not idle) is
Java heap memory, so you might need to increase the java heap.  Memory
in the server is one of the most important resources you have for Solr
performance, and here I am talking about memory that is *not* used in
the Java heap (or any other program) -- the OS must be able to
effectively cache your index data or Solr performance will be terrible.

You have said "Solr cluster" and "collection" ... so that makes me think
you're running SolrCloud.  In cloud mode, you can't really use the
LotsOfCores functionality, where you mark cores transient and tell Solr
how many cores you'd like to have resident at the same time.  If you are
NOT in cloud mode, then you can use this feature:

http://wiki.apache.org/solr/LotsOfCores

In general, there are three resources other than memory which might
become exhausted with a large number of cores:

One resource is the "maximum open files" limit in the operating system,
which typically defaults to 1024.  Each core will typically have several
dozen files in its index, so it's very easy to reach 1024 open files.

The second resource is the maximum allowed threads in your servlet
container config -- each core you add requires more threads.  The
default maxThreads value in most containers is 200.  The Jetty container
included in the Solr download is preconfigured with a maxThreads value
of 10000, effectively removing the limit for most setups.

The third resource is related to the second -- some operating systems
implement threads as hidden processes, and many operating systems will
limit the number of processes that a user may start.  On Linux, this
limit is typically 1024, and may need to be increased.

I really need to add this kind of info to the wiki.

Thanks,
Shawn

Reply via email to