Because you use individual collections, you really don’t have to care about getting it all right up front.
Each collection can be created on a specified set of nodes, see the “createNodeSet” parameter of the collections API “CREATE” command. And let’s say you figure out later that you need more hardware and want to move some of your existing collections to new hardware. Use the MOVEREPLICA API command. So say you start out with 1 machine hosting 500 collections. You get more and more and more clients and your machine gets overloaded. Or one of your collections grows disproportionately to the others. You spin up a new machine and MOVEREPLICA for some number of replicas on your original machine to the new hardware. Also consider that at some point, it may be desirable to have multiple “pods”. By that I mean it can get awkward to have thousands of collections hosted on a single Zookeeper ensemble. Again, because you have individual collections you can declare one “pod” (Zookeeper + Solr nodes) full and spin up another one, i.e. totally separate hardware, separate ZK ensembles. The pods don’t know about each other at all. Best, Erick > On Dec 6, 2019, at 3:12 AM, Vignan Malyala <dsmsvig...@gmail.com> wrote: > > Hi Shawn, > > Thanks for your response! > > Yes! 500 collections. > Each collection/core has around 50k to 50L documents/jsons (depending upon > the client). We made one core for each client. Each json has 15 fields. > It already in production as as Solr stand alone server. > We want to use SolrCloud for it now, so as to make it scalable for future > safety. How do I make it possible? > > As per your response, I understood that, I have to create 3 zookeeper > instances and some machines that house 1 solr node each. > Is that the optimized solution? *And how many machines do I need to build > to house solr nodes keeping in mind 500 collections?* > > Thanks in advance! > > On Fri, Dec 6, 2019 at 11:44 AM Shawn Heisey <apa...@elyograg.org> wrote: > >> On 12/5/2019 12:28 PM, Vignan Malyala wrote: >>> I currently have 500 collections in my stand alone solr. Bcoz of day by >> day >>> increase in Data, I want to convert it into solr cloud. >>> Can you suggest me how to do it successfully. >>> How many shards should be there? >>> How many nodes should be there? >>> Are so called nodes different machines i should take? >>> How many zoo keeper nodes should be there? >>> Are so called zoo keeper nodes different machines i should take? >>> Total how many machines i have to take to implement scalable solr cloud? >> >> 500 collections is large enough that running it in SolrCloud is likely >> to encounter scalability issues. SolrCloud's design does not do well >> with that many collections in the cluster, even if there are a lot of >> machines. >> >> There's a lot of comment history on this issue: >> >> https://issues.apache.org/jira/browse/SOLR-7191 >> >> Generally speaking, each machine should only house one Solr node, >> whether you're running cloud or not. If each one requires a really huge >> heap, it might be worthwhile to split it, but that's the only time I >> would do so. And I would generally prefer to add more machines than to >> run multiple Solr nodes on one machine. >> >> One thing you might do, if the way your data is divided will permit it, >> is to run multiple SolrCloud clusters. Multiple clusters can all use >> one ZooKeeper ensemble. >> >> ZooKeeper requires a minimum of three machines for fault tolerance. >> With 3 or 4 machines in the ensemble, you can survive one machine >> failure. To survive two failures requires at least 5 machines. >> >> Thanks, >> Shawn >>