One thing I like about SolrCloud is that I don't have to configure Master/Slave 
replication in each "core" the same way to get them to replicate.

The other thing I like about SolrCloud, which is largely theoretical at this 
point, is that I don't need to test changes to a collection's configuration by 
bringing up a whole new solr on a whole new server - SolrCloud already 
virtualizes this, and so I can make up a random collection name that doesn't 
conflict, and create the thing, and smoke test with it.   I know that standard 
practice is to bring up all new nodes, but I don't see why this is needed.

-----Original Message-----
From: John Bickerstaff [mailto:j...@johnbickerstaff.com] 
Sent: Monday, April 18, 2016 1:23 PM
To: solr-user@lucene.apache.org
Subject: Re: Verifying - SOLR Cloud replaces load balancer?

So - my IT guy makes the case that we don't really need Zookeeper / Solr 
Cloud...

He may be right - we're serving static data (changes to the collection occur 
only 2 or 3 times a year and are minor)

We probably could have 3 or 4 Solr nodes running in non-Cloud mode -- each 
configured the same way, behind a load balancer and do fine.

I've got a Kafka server set up with the solr docs as topics.  It takes about 10 
minutes to reload a "blank" Solr Server from the Kafka topic...
If I target 3-4 SOLR servers from my microservice instead of one, it wouldn't 
take much longer than 10 minutes to concurrently reload all 3 or 4 Solr servers 
from scratch...

I'm biased in terms of using the most recent functionality, but I'm aware that 
bias is not necessarily based on facts and want to do my due diligence...

Aside from the obvious benefits of spreading work across nodes (which may not 
be a big deal in our application and which my IT guy proposes is more 
transparently handled with a load balancer he understands) are there any other 
considerations that would drive a choice for Solr Cloud (zookeeper etc)?



On Mon, Apr 18, 2016 at 9:26 AM, Tom Evans <tevans...@googlemail.com> wrote:

> On Mon, Apr 18, 2016 at 3:52 PM, John Bickerstaff 
> <j...@johnbickerstaff.com> wrote:
> > Thanks all - very helpful.
> >
> > @Shawn - your reply implies that even if I'm hitting the URL for a 
> > single endpoint via HTTP - the "balancing" will still occur across 
> > the Solr
> Cloud
> > (I understand the caveat about that single endpoint being a 
> > potential
> point
> > of failure).  I just want to verify that I'm interpreting your 
> > response correctly...
> >
> > (I have been asked to provide IT with a comprehensive list of 
> > options
> prior
> > to a design discussion - which is why I'm trying to get clear about 
> > the various options)
> >
> > In a nutshell, I think I understand the following:
> >
> > a. Even if hitting a single URL, the Solr Cloud will "balance" 
> > across all available nodes for searching
> >           Caveat: That single URL represents a potential single 
> > point of failure and this should be taken into account
> >
> > b. SolrJ's CloudSolrClient API provides the ability to distribute 
> > load -- based on Zookeeper's "knowledge" of all available Solr instances.
> >           Note: This is more robust than "a" due to the fact that it 
> > eliminates the "single point of failure"
> >
> > c.  Use of a load balancer hitting all known Solr instances will be 
> > fine
> -
> > although the search requests may not run on the Solr instance the 
> > load balancer targeted - due to "a" above.
> >
> > Corrections or refinements welcomed...
>
> With option a), although queries will be distributed across the 
> cluster, all queries will be going through that single node. Not only 
> is that a single point of failure, but you risk saturating the 
> inter-node network traffic, possibly resulting in lower QPS and higher 
> latency on your queries.
>
> With option b), as well as SolrJ, recent versions of pysolr have a 
> ZK-aware SolrCloud client that behaves in a similar way.
>
> With option c), you can use the preferLocalShards so that shards that 
> are local to the queried node are used in preference to distributed 
> shards. Depending on your shard/cluster topology, this can increase 
> performance if you are returning large amounts of data - many or large 
> fields or many documents.
>
> Cheers
>
> Tom
>

Reply via email to