On 3/24/2015 11:22 AM, Ian Rose wrote: > Let me give a bit of background. Our Solr cluster is multi-tenant, where > we use one collection for each of our customers. In many cases, these > customers are very tiny, so their collection consists of just a single > shard on a single Solr node. In fact, a non-trivial number of them are > totally empty (e.g. trial customers that never did anything with their > trial account). However there are also some customers that are larger, > requiring their collection to be sharded. Our strategy is to try to keep > the total documents in any one shard under 20 million (honestly not sure > where my coworker got that number from - I am open to alternatives but I > realize this is heavily app-specific). > > So my original question is not related to indexing or query traffic, but > just the sheer number of cores. For example, if I have 10 active cores on > a machine and everything is working fine, should I expect that everything > will still work fine if I add 10 nearly-idle cores to that machine? What > about 100? 1000? I figure the overhead of each core is probably fairly > low but at some point starts to matter.
One resource that may be exhausted faster than any other when you have a lot of cores on a solr instance (especially when they are not idle) is Java heap memory, so you might need to increase the java heap. Memory in the server is one of the most important resources you have for Solr performance, and here I am talking about memory that is *not* used in the Java heap (or any other program) -- the OS must be able to effectively cache your index data or Solr performance will be terrible. You have said "Solr cluster" and "collection" ... so that makes me think you're running SolrCloud. In cloud mode, you can't really use the LotsOfCores functionality, where you mark cores transient and tell Solr how many cores you'd like to have resident at the same time. If you are NOT in cloud mode, then you can use this feature: http://wiki.apache.org/solr/LotsOfCores In general, there are three resources other than memory which might become exhausted with a large number of cores: One resource is the "maximum open files" limit in the operating system, which typically defaults to 1024. Each core will typically have several dozen files in its index, so it's very easy to reach 1024 open files. The second resource is the maximum allowed threads in your servlet container config -- each core you add requires more threads. The default maxThreads value in most containers is 200. The Jetty container included in the Solr download is preconfigured with a maxThreads value of 10000, effectively removing the limit for most setups. The third resource is related to the second -- some operating systems implement threads as hidden processes, and many operating systems will limit the number of processes that a user may start. On Linux, this limit is typically 1024, and may need to be increased. I really need to add this kind of info to the wiki. Thanks, Shawn