I haven't got anything to back this up, but I'd say there's no issue pointing your load balancer to all your nodes. When you do a distributed query, the work required of the distributed part is relatively small - it pushes the request to all the shard nodes, then does the job of merging the results. This does not require large caches or any such, so I do not see that you're going to have resource advantages to limiting them to specific nodes.
Upayavira On Wed, Jan 30, 2013, at 01:45 PM, Lee, Peter wrote: > Upayavira, > > Thank you for your response. I'm sorry my post is perhaps not clear...I > am relatively new to solr and I'm not sure I'm using the correct > nomenclature. > > We did encounter the issue of one shard in the stripe going down and all > other shards continue to receive requests...and return errors because of > the missing shard. We did in fact correct this problem by making our > healthcheck smart enough to test all of the other servers in the stripe. > That works very well and was not hard at all to implement. > > My intended question was one entirely about performance. Perhaps if I am > more specific it will help. > > We have 6 servers per "stripe" (which means, a search request going to > any one of these servers also generates traffic on the other 5 servers in > the stripe to fulfill the request) and multiple stripes (for load and for > redundancy). For this discussion though, let's assume we have only ONE > stripe. > > We currently have a load balancer that points to all 6 of the servers in > our stripe. That is, requests from "outside" can be directed to any > server in the stripe. > > The question is: Has anyone performed empirical testing to see if perhaps > having 2 or 3 servers (instead of all 6) on the load balancer improves > performance? > > In this configuration, sure, not all servers can field requests from the > "outside." However, the total amount of "conversation" going on between > the different servers will also be lower, as distributed searches can now > only originate from 2 or 3 servers in the stripe (however many we > attached to the load balancer). > > We can perform this testing, but it will take time, so I thought I'd ask > if anyone has done this already. I was hoping to find a mention of a > "best practice" somewhere regarding this type of question, but I have not > found one yet. > > Thanks. > > Peter S. Lee > > -----Original Message----- > From: Upayavira [mailto:u...@odoko.co.uk] > Sent: Wednesday, January 30, 2013 5:24 AM > To: solr-user@lucene.apache.org > Subject: Re: A question about attaching shards to load balancers > > I'm afraid I'm note completely clear about your scenario. Let me say how > I understand what you're saying, and what I've done in the past. > > Firstly, I take it you are using Solr 3.x (from your reference to a > 'shards' parameter. > > Secondly, you refer to a 'stripe' as one set of nodes, one for each > shard, that are enough to allow querying your whole collection. > > Having created the concept of a 'slice', you then hardwire the 'shards' > parameter in solrconfig.xml in each machine in that slice to point to all > the other nodes in that same slice. > > Then you point your load balancer at some boxes, which will do > distributed queries. Now, by the sounds of it, every box on your setup > could do this, they all have a shards parameter set up. Minimally, you'll > want at least one box from each slice, otherwise you'll have slices that > aren't receiving queries. But could you include all of your boxes, and > have all of them handling the query distribution work? I guess you could, > but I'd suggest another architecture. > > In the setup you describe, if you loose one node, you loose an entire > slice. However, if a distributed query comes into another node in the > slice, the load balancer may well not notice (unless you make the > healthcheck itself do a distributed search) and things could get messy. > > What I've done is set up a VIP in my load balancer for each and every > node that can service a shard. Repeat that for each shard that I have. > Let's say I have four shards, I'll end up with four VIPs. I then put > those four VIPs into my shards parameter in solrconfig.xml on all of my > hosts, regardless of what shard/slice. > > Then, I create another VIP that includes all of my nodes in it. That is > the one that I hand to my application. > > This way, you can loose any node in any shard and the thing should keep > on going. > > Obviously I'm talking about slaves here. There will be a master for each > shard which each of these nodes pull their indexes from. > > Hope this is helpful. > > Upayavira > > On Tue, Jan 29, 2013, at 09:35 PM, Lee, Peter wrote: > > I would appreciate people's experience on the following load balancing > > question... > > > > We currently have solr configured in shards across multiple machines > > to handle our load. That is, a request being sent to any one of these > > servers will cause that server to query the rest of the servers in > > that "stripe" (we use the term "stripe" to refer to a set of servers > > that point to each other with the shard parameter). > > > > We currently have all servers in a stripe registered with our load > > balancer. Thus, requests are being spread out across all servers in > > the stripe...but of course requests to any shard generates additional > > traffic on all shards in that stripe. > > > > My question (finally) is this: Has anyone determined if it is better > > to place only a few (that is, not all) of the shards in a stripe on > > the load balancer as versus ALL of the shards in a stripe on the load > > balancer? It seemed to me at first that it would not make much of a > > difference, but then I realized that this would really depend on the > > relative costs of a few different steps (one step would be the cost of > > collecting all of the responses from the other servers in the shard to > > formulate the final answer. Another step would be the cost of > > generating more traffic between the shards, etc.). > > > > So what I am trying to ask is this: If we had 6 servers in a "stripe" > > (6 servers set up as shards to support a single query), would there be > > any advantage with respect to handling load to only place ONE or TWO > > of the shards on the load balancer as versus putting ALL shards on the > > load balancer? > > > > We can test this empirically but if the community already has gotten a > > feel for the best practice in this situation I would be happy to learn > > from your experience. I could not find anything online that spoke to > > this particular situation. > > > > Thanks. > > > > Peter S. Lee > > Senior Software Engineer > > ProQuest > > 789 E. Eisenhower Parkway > > Ann Arbor, MI, 48106-1346 > > USA > > 734-761-4700 x72025 > > peter....@proquest.com > > www.proquest.com > > > > ProQuest...Start here > > InformationWeek 500 Top > > Innovator<http://www.proquest.com/en-US/aboutus/pressroom/09/20090922. > > shtml> > > > >