Thanks Eric, for the confirmation. On Apr 18, 2016 5:48 PM, "Erick Erickson" <erickerick...@gmail.com> wrote:
> In short, I'm afraid I have to agree with your IT guy. > > I like SolrCloud, it's waaaay cool. But in your situation I really > can't say it's compelling. > > The places SolrCloud shines: automatically routing docs to shards.. > You're not sharing. > > Automatically electing a new leader (analogous to master) ... You > don't care since the pain of reindexing is so little. > > Not losing data when a leader/master goes down during indexing... You > don't care since you can reindex quickly and you're indexing so > rarely. > > In fact, I'd also optimize the index, Something I rarely recommend. > > Even the argument that you get to use all your nodes for searching > doesn't really pertain since you can index on a node, then just copy > the index to all your nodes, you could get by without even configuring > master/slave. Or just, as you say, index to all your Solr nodes > simultaneously. > > About the only downside is that you've got to create your Solr nodes > independently, making sure the proper configurations are on each one > etc, but even if those changed 2-3 times a year it's hardly onerous. > > You _are_ getting all the latest and greatest indexing and search > improvements, all the SolrCloud stuff is built on top of exactly the > Solr you'd get without using SolrCloud. > > And finally, there is certainly a learning curve to SolrCloud, > particularly in this case the care and feeding of Zookeeper. > > The instant you need to have shards, the argument changes quite > dramatically. The argument changes some under significant indexing > loads. The argument totally changes if you need low latency. It > doesn't sound like your situation is sensitive to any of these > though.... > > Best, > Erick > > On Apr 18, 2016 10:41 AM, "John Bickerstaff" <j...@johnbickerstaff.com> > wrote: > > > > Nice - thanks Daniel. > > > > On Mon, Apr 18, 2016 at 11:38 AM, Davis, Daniel (NIH/NLM) [C] < > > daniel.da...@nih.gov> wrote: > > > > > One thing I like about SolrCloud is that I don't have to configure > > > Master/Slave replication in each "core" the same way to get them to > > > replicate. > > > > > > The other thing I like about SolrCloud, which is largely theoretical at > > > this point, is that I don't need to test changes to a collection's > > > configuration by bringing up a whole new solr on a whole new server - > > > SolrCloud already virtualizes this, and so I can make up a random > > > collection name that doesn't conflict, and create the thing, and smoke > test > > > with it. I know that standard practice is to bring up all new nodes, > but > > > I don't see why this is needed. > > > > > > -----Original Message----- > > > From: John Bickerstaff [mailto:j...@johnbickerstaff.com] > > > Sent: Monday, April 18, 2016 1:23 PM > > > To: solr-user@lucene.apache.org > > > Subject: Re: Verifying - SOLR Cloud replaces load balancer? > > > > > > So - my IT guy makes the case that we don't really need Zookeeper / > Solr > > > Cloud... > > > > > > He may be right - we're serving static data (changes to the collection > > > occur only 2 or 3 times a year and are minor) > > > > > > We probably could have 3 or 4 Solr nodes running in non-Cloud mode -- > each > > > configured the same way, behind a load balancer and do fine. > > > > > > I've got a Kafka server set up with the solr docs as topics. It takes > > > about 10 minutes to reload a "blank" Solr Server from the Kafka > topic... > > > If I target 3-4 SOLR servers from my microservice instead of one, it > > > wouldn't take much longer than 10 minutes to concurrently reload all 3 > or 4 > > > Solr servers from scratch... > > > > > > I'm biased in terms of using the most recent functionality, but I'm > aware > > > that bias is not necessarily based on facts and want to do my due > > > diligence... > > > > > > Aside from the obvious benefits of spreading work across nodes (which > may > > > not be a big deal in our application and which my IT guy proposes is > more > > > transparently handled with a load balancer he understands) are there > any > > > other considerations that would drive a choice for Solr Cloud > (zookeeper > > > etc)? > > > > > > > > > > > > On Mon, Apr 18, 2016 at 9:26 AM, Tom Evans <tevans...@googlemail.com> > > > wrote: > > > > > > > On Mon, Apr 18, 2016 at 3:52 PM, John Bickerstaff > > > > <j...@johnbickerstaff.com> wrote: > > > > > Thanks all - very helpful. > > > > > > > > > > @Shawn - your reply implies that even if I'm hitting the URL for a > > > > > single endpoint via HTTP - the "balancing" will still occur across > > > > > the Solr > > > > Cloud > > > > > (I understand the caveat about that single endpoint being a > > > > > potential > > > > point > > > > > of failure). I just want to verify that I'm interpreting your > > > > > response correctly... > > > > > > > > > > (I have been asked to provide IT with a comprehensive list of > > > > > options > > > > prior > > > > > to a design discussion - which is why I'm trying to get clear about > > > > > the various options) > > > > > > > > > > In a nutshell, I think I understand the following: > > > > > > > > > > a. Even if hitting a single URL, the Solr Cloud will "balance" > > > > > across all available nodes for searching > > > > > Caveat: That single URL represents a potential single > > > > > point of failure and this should be taken into account > > > > > > > > > > b. SolrJ's CloudSolrClient API provides the ability to distribute > > > > > load -- based on Zookeeper's "knowledge" of all available Solr > > > instances. > > > > > Note: This is more robust than "a" due to the fact that > it > > > > > eliminates the "single point of failure" > > > > > > > > > > c. Use of a load balancer hitting all known Solr instances will be > > > > > fine > > > > - > > > > > although the search requests may not run on the Solr instance the > > > > > load balancer targeted - due to "a" above. > > > > > > > > > > Corrections or refinements welcomed... > > > > > > > > With option a), although queries will be distributed across the > > > > cluster, all queries will be going through that single node. Not only > > > > is that a single point of failure, but you risk saturating the > > > > inter-node network traffic, possibly resulting in lower QPS and > higher > > > > latency on your queries. > > > > > > > > With option b), as well as SolrJ, recent versions of pysolr have a > > > > ZK-aware SolrCloud client that behaves in a similar way. > > > > > > > > With option c), you can use the preferLocalShards so that shards that > > > > are local to the queried node are used in preference to distributed > > > > shards. Depending on your shard/cluster topology, this can increase > > > > performance if you are returning large amounts of data - many or > large > > > > fields or many documents. > > > > > > > > Cheers > > > > > > > > Tom > > > > > > > >