Thanks Erick. I agree SolrCloud is better than master/slave, however we have some questions between managing replicas separately vs with solrcloud. For eg how much overhead do SolrCloud nodes have wrt memory/cpu/disk in order to be able to sync pending index updates to other replicas? What monitoring and safeguards are in place out of the box so too many pending updates for unreachable replicas don't make the alive ones fall over? Or a new replica doesn't overwhelm existing replica.
Of course everything works great when things are running well but when things go south our preference would be for solr to not fall over as first priority. On Fri, Dec 15, 2017 at 9:41 AM, Erick Erickson <erickerick...@gmail.com> wrote: > The main advantage in SolrCloud in your setup is HA/DR. You say you > have multiple replicas and shards. Either you have to index to each > replica separately or you use master/slave replication. In either case > you have to manage and fix the case where some node goes down. If > you're using master/slave, if the master goes down you need to get in > there and fix it, reassign the master, make config changes, restart > Solr to pick them up, make sure you pick up any missed updates and all > that. > > in SolrCloud that is managed for you. Plus, let's say you want to > increase QPS capacity. In SolrCloud all you do is use the collections > API ADDREPLICA command and you're done. It gets created (and you can > specify exactly what node if you want), the index gets copied, new > updates are automatically routed to it and it starts serving requests > when it's synchronized all automagically. Symmetrically you can > DELETEREPLICA if you have too much capacity. > > The price here is you have to get comfortable with maintaining > ZooKeeper admittedly. > > Also in the 7x world you have different types of replicas, TLOG, PULL > and NRT that combine some of the features of master/slave with > SolrCloud. > > Generally my rule of thumb is the minute you get beyond a single shard > you should move to SolrCloud. If all your data fits in one Solr core > then it's less clear-cut, master/slave can work just fine. It Depends > (tm) of course. > > Your use case is "implicit" (being renamed "manual") routing when you > create your Solr collection. There are pros and cons here, but that's > beyond the scope of your question. Your infrastructure should port > pretty directly to SolrCloud. The short form is that all your indexing > and/or querying is happening on a single node when using manual > routing rather than in parallel. Of course executing parallel > sub-queries imposes its own overhead..... > > If your use-case for having these on a single shard it to segregate > the data by some set (say users), you might want to consider just > using separate _collections_ in SolrCloud where old_shard == > new_collection, basically all your routing is the same. You can create > aliases pointing to multiple collections or specify multiple > collections on the query, don't know if that fits your use case or not > though. > > > Best, > Erick > > On Fri, Dec 15, 2017 at 9:03 AM, John Davis <johndavis925...@gmail.com> > wrote: > > Hello, > > We are thinking about migrating to SolrCloud. Our current setup is: > > 1. Multiple replicas and shards. > > 2. Each query typically hits a single shard only. > > 3. We have an external system that assigns a document to a shard based on > > it's origin and is also used by solr clients when querying to find the > > correct shard to query. > > > > It looks like the biggest advantage of SolrCloud is #3 - to route > document > > to the correct shard & replicas when indexing and to route query > similarly. > > Given we already have a fairly reliable system to do this, are there > other > > benefits from migrating to SolrCloud? > > > > Thanks, > > John >