Re: A question about attaching shards to load balancers

Upayavira Wed, 30 Jan 2013 06:41:07 -0800

I haven't got anything to back this up, but I'd say there's no issue
pointing your load balancer to all your nodes. When you do a distributed
query, the work required of the distributed part is relatively small -
it pushes the request to all the shard nodes, then does the job of
merging the results. This does not require large caches or any such, so
I do not see that you're going to have resource advantages to limiting
them to specific nodes.


Upayavira

On Wed, Jan 30, 2013, at 01:45 PM, Lee, Peter wrote:
> Upayavira,
> 
> Thank you for your response. I'm sorry my post is perhaps not clear...I
> am relatively new to solr and I'm not sure I'm using the correct
> nomenclature.
> 
> We did encounter the issue of one shard in the stripe going down and all
> other shards continue to receive requests...and return errors because of
> the missing shard. We did in fact correct this problem by making our
> healthcheck smart enough to test all of the other servers in the stripe.
> That works very well and was not hard at all to implement.
> 
> My intended question was one entirely about performance.  Perhaps if I am
> more specific it will help.
> 
> We have 6 servers per "stripe" (which means, a search request going to
> any one of these servers also generates traffic on the other 5 servers in
> the stripe to fulfill the request) and multiple stripes (for load and for
> redundancy). For this discussion though, let's assume we have only ONE
> stripe.
> 
> We currently have a load balancer that points to all 6 of the servers in
> our stripe. That is, requests from "outside" can be directed to any
> server in the stripe.
> 
> The question is: Has anyone performed empirical testing to see if perhaps
> having 2 or 3 servers (instead of all 6) on the load balancer improves
> performance? 
> 
> In this configuration, sure, not all servers can field requests from the
> "outside." However, the total amount of "conversation" going on between
> the different servers will also be lower, as distributed searches can now
> only originate from 2 or 3 servers in the stripe (however many we
> attached to the load balancer).
> 
> We can perform this testing, but it will take time, so I thought I'd ask
> if anyone has done this already. I was hoping to find a mention of a
> "best practice" somewhere regarding this type of question, but I have not
> found one yet.
> 
> Thanks.
> 
> Peter S. Lee
> 
> -----Original Message-----
> From: Upayavira [mailto:u...@odoko.co.uk] 
> Sent: Wednesday, January 30, 2013 5:24 AM
> To: solr-user@lucene.apache.org
> Subject: Re: A question about attaching shards to load balancers
> 
> I'm afraid I'm note completely clear about your scenario. Let me say how
> I understand what you're saying, and what I've done in the past.
> 
> Firstly, I take it you are using Solr 3.x (from your reference to a
> 'shards' parameter.
> 
> Secondly, you refer to a 'stripe' as one set of nodes, one for each
> shard, that are enough to allow querying your whole collection.
> 
> Having created the concept of a 'slice', you then hardwire the 'shards'
> parameter in solrconfig.xml in each machine in that slice to point to all
> the other nodes in that same slice.
> 
> Then you point your load balancer at some boxes, which will do
> distributed queries. Now, by the sounds of it, every box on your setup
> could do this, they all have a shards parameter set up. Minimally, you'll
> want at least one box from each slice, otherwise you'll have slices that
> aren't receiving queries. But could you include all of your boxes, and
> have all of them handling the query distribution work? I guess you could,
> but I'd suggest another architecture.
> 
> In the setup you describe, if you loose one node, you loose an entire
> slice. However, if a distributed query comes into another node in the
> slice, the load balancer may well not notice (unless you make the
> healthcheck itself do a distributed search) and things could get messy.
> 
> What I've done is set up a VIP in my load balancer for each and every
> node that can service a shard. Repeat that for each shard that I have.
> Let's say I have four shards, I'll end up with four VIPs. I then put
> those four VIPs into my shards parameter in solrconfig.xml on all of my
> hosts, regardless of what shard/slice.
> 
> Then, I create another VIP that includes all of my nodes in it. That is
> the one that I hand to my application. 
> 
> This way, you can loose any node in any shard and the thing should keep
> on going. 
> 
> Obviously I'm talking about slaves here. There will be a master for each
> shard which each of these nodes pull their indexes from.
> 
> Hope this is helpful.
> 
> Upayavira
> 
> On Tue, Jan 29, 2013, at 09:35 PM, Lee, Peter wrote:
> > I would appreciate people's experience on the following load balancing 
> > question...
> > 
> > We currently have solr configured in shards across multiple machines 
> > to handle our load. That is, a request being sent to any one of these 
> > servers will cause that server to query the rest of the servers in 
> > that "stripe" (we use the term "stripe" to refer to a set of servers 
> > that point to each other with the shard parameter).
> > 
> > We currently have all servers in a stripe registered with our load 
> > balancer. Thus, requests are being spread out across all servers in 
> > the stripe...but of course requests to any shard generates additional 
> > traffic on all shards in that stripe.
> > 
> > My question (finally) is this: Has anyone determined if it is better 
> > to place only a few (that is, not all) of the shards in a stripe on 
> > the load balancer as versus ALL of the shards in a stripe on the load 
> > balancer? It seemed to me at first that it would not make much of a 
> > difference, but then I realized that this would really depend on the 
> > relative costs of a few different steps (one step would be the cost of 
> > collecting all of the responses from the other servers in the shard to 
> > formulate the final answer. Another step would be the cost of 
> > generating more traffic between the shards, etc.).
> > 
> > So what I am trying to ask is this: If we had 6  servers in a "stripe" 
> > (6 servers set up as shards to support a single query), would there be 
> > any advantage with respect to handling load to only place ONE or TWO 
> > of the shards on the load balancer as versus putting ALL shards on the 
> > load balancer?
> > 
> > We can test this empirically but if the community already has gotten a 
> > feel for the best practice in this situation I would be happy to learn 
> > from your experience. I could not find anything online that spoke to 
> > this particular situation.
> > 
> > Thanks.
> > 
> > Peter S. Lee
> > Senior Software Engineer
> > ProQuest
> > 789 E. Eisenhower Parkway
> > Ann Arbor, MI, 48106-1346
> > USA
> > 734-761-4700 x72025
> > peter....@proquest.com
> > www.proquest.com
> > 
> > ProQuest...Start here
> > InformationWeek 500 Top
> > Innovator<http://www.proquest.com/en-US/aboutus/pressroom/09/20090922.
> > shtml>
> > 
> 
>

Re: A question about attaching shards to load balancers

Reply via email to