Re: Verifying - SOLR Cloud replaces load balancer?

John Bickerstaff Mon, 18 Apr 2016 17:15:47 -0700

Thanks Eric, for the  confirmation.
On Apr 18, 2016 5:48 PM, "Erick Erickson" <erickerick...@gmail.com> wrote:


> In short, I'm afraid I have to agree with your IT guy.
>
> I like SolrCloud, it's waaaay cool. But in your situation I really
> can't say it's compelling.
>
> The places SolrCloud shines: automatically routing docs to shards..
> You're not sharing.
>
> Automatically electing a new leader (analogous to master) ... You
> don't care since the pain of reindexing is so little.
>
> Not losing data when a leader/master goes down during indexing... You
> don't care since you can reindex quickly and you're indexing so
> rarely.
>
> In fact, I'd also optimize the index, Something I rarely recommend.
>
> Even the argument that you get to use all your nodes for searching
> doesn't really pertain since you can index on a node, then just copy
> the index to all your nodes, you could get by without even configuring
> master/slave. Or just, as you say, index to all your Solr nodes
> simultaneously.
>
> About the only downside is that you've got to create your Solr nodes
> independently, making sure the proper configurations are on each one
> etc, but even if those changed 2-3 times a year it's hardly onerous.
>
> You _are_ getting all the latest and greatest indexing and search
> improvements, all the SolrCloud stuff is built on top of exactly the
> Solr you'd get without using SolrCloud.
>
> And finally, there is certainly a learning curve to SolrCloud,
> particularly in this case the care and feeding of Zookeeper.
>
> The instant you need to have shards, the argument changes quite
> dramatically. The argument changes some under significant indexing
> loads. The argument totally changes if you need low latency. It
> doesn't sound like your situation is sensitive to any of these
> though....
>
> Best,
> Erick
>
> On Apr 18, 2016 10:41 AM, "John Bickerstaff" <j...@johnbickerstaff.com>
> wrote:
> >
> > Nice - thanks Daniel.
> >
> > On Mon, Apr 18, 2016 at 11:38 AM, Davis, Daniel (NIH/NLM) [C] <
> > daniel.da...@nih.gov> wrote:
> >
> > > One thing I like about SolrCloud is that I don't have to configure
> > > Master/Slave replication in each "core" the same way to get them to
> > > replicate.
> > >
> > > The other thing I like about SolrCloud, which is largely theoretical at
> > > this point, is that I don't need to test changes to a collection's
> > > configuration by bringing up a whole new solr on a whole new server -
> > > SolrCloud already virtualizes this, and so I can make up a random
> > > collection name that doesn't conflict, and create the thing, and smoke
> test
> > > with it.   I know that standard practice is to bring up all new nodes,
> but
> > > I don't see why this is needed.
> > >
> > > -----Original Message-----
> > > From: John Bickerstaff [mailto:j...@johnbickerstaff.com]
> > > Sent: Monday, April 18, 2016 1:23 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Verifying - SOLR Cloud replaces load balancer?
> > >
> > > So - my IT guy makes the case that we don't really need Zookeeper /
> Solr
> > > Cloud...
> > >
> > > He may be right - we're serving static data (changes to the collection
> > > occur only 2 or 3 times a year and are minor)
> > >
> > > We probably could have 3 or 4 Solr nodes running in non-Cloud mode --
> each
> > > configured the same way, behind a load balancer and do fine.
> > >
> > > I've got a Kafka server set up with the solr docs as topics.  It takes
> > > about 10 minutes to reload a "blank" Solr Server from the Kafka
> topic...
> > > If I target 3-4 SOLR servers from my microservice instead of one, it
> > > wouldn't take much longer than 10 minutes to concurrently reload all 3
> or 4
> > > Solr servers from scratch...
> > >
> > > I'm biased in terms of using the most recent functionality, but I'm
> aware
> > > that bias is not necessarily based on facts and want to do my due
> > > diligence...
> > >
> > > Aside from the obvious benefits of spreading work across nodes (which
> may
> > > not be a big deal in our application and which my IT guy proposes is
> more
> > > transparently handled with a load balancer he understands) are there
> any
> > > other considerations that would drive a choice for Solr Cloud
> (zookeeper
> > > etc)?
> > >
> > >
> > >
> > > On Mon, Apr 18, 2016 at 9:26 AM, Tom Evans <tevans...@googlemail.com>
> > > wrote:
> > >
> > > > On Mon, Apr 18, 2016 at 3:52 PM, John Bickerstaff
> > > > <j...@johnbickerstaff.com> wrote:
> > > > > Thanks all - very helpful.
> > > > >
> > > > > @Shawn - your reply implies that even if I'm hitting the URL for a
> > > > > single endpoint via HTTP - the "balancing" will still occur across
> > > > > the Solr
> > > > Cloud
> > > > > (I understand the caveat about that single endpoint being a
> > > > > potential
> > > > point
> > > > > of failure).  I just want to verify that I'm interpreting your
> > > > > response correctly...
> > > > >
> > > > > (I have been asked to provide IT with a comprehensive list of
> > > > > options
> > > > prior
> > > > > to a design discussion - which is why I'm trying to get clear about
> > > > > the various options)
> > > > >
> > > > > In a nutshell, I think I understand the following:
> > > > >
> > > > > a. Even if hitting a single URL, the Solr Cloud will "balance"
> > > > > across all available nodes for searching
> > > > >           Caveat: That single URL represents a potential single
> > > > > point of failure and this should be taken into account
> > > > >
> > > > > b. SolrJ's CloudSolrClient API provides the ability to distribute
> > > > > load -- based on Zookeeper's "knowledge" of all available Solr
> > > instances.
> > > > >           Note: This is more robust than "a" due to the fact that
> it
> > > > > eliminates the "single point of failure"
> > > > >
> > > > > c.  Use of a load balancer hitting all known Solr instances will be
> > > > > fine
> > > > -
> > > > > although the search requests may not run on the Solr instance the
> > > > > load balancer targeted - due to "a" above.
> > > > >
> > > > > Corrections or refinements welcomed...
> > > >
> > > > With option a), although queries will be distributed across the
> > > > cluster, all queries will be going through that single node. Not only
> > > > is that a single point of failure, but you risk saturating the
> > > > inter-node network traffic, possibly resulting in lower QPS and
> higher
> > > > latency on your queries.
> > > >
> > > > With option b), as well as SolrJ, recent versions of pysolr have a
> > > > ZK-aware SolrCloud client that behaves in a similar way.
> > > >
> > > > With option c), you can use the preferLocalShards so that shards that
> > > > are local to the queried node are used in preference to distributed
> > > > shards. Depending on your shard/cluster topology, this can increase
> > > > performance if you are returning large amounts of data - many or
> large
> > > > fields or many documents.
> > > >
> > > > Cheers
> > > >
> > > > Tom
> > > >
> > >
>

Re: Verifying - SOLR Cloud replaces load balancer?

Reply via email to