On 1/30/2013 6:45 AM, Lee, Peter wrote:
Upayavira,
Thank you for your response. I'm sorry my post is perhaps not clear...I am
relatively new to solr and I'm not sure I'm using the correct nomenclature.
We did encounter the issue of one shard in the stripe going down and all other
shards continue to receive requests...and return errors because of the missing
shard. We did in fact correct this problem by making our healthcheck smart
enough to test all of the other servers in the stripe. That works very well and
was not hard at all to implement.
My intended question was one entirely about performance. Perhaps if I am more
specific it will help.
We have 6 servers per "stripe" (which means, a search request going to any one
of these servers also generates traffic on the other 5 servers in the stripe to fulfill
the request) and multiple stripes (for load and for redundancy). For this discussion
though, let's assume we have only ONE stripe.
We currently have a load balancer that points to all 6 of the servers in our stripe. That
is, requests from "outside" can be directed to any server in the stripe.
The question is: Has anyone performed empirical testing to see if perhaps
having 2 or 3 servers (instead of all 6) on the load balancer improves
performance?
In this configuration, sure, not all servers can field requests from the "outside."
However, the total amount of "conversation" going on between the different servers will
also be lower, as distributed searches can now only originate from 2 or 3 servers in the stripe
(however many we attached to the load balancer).
We can perform this testing, but it will take time, so I thought I'd ask if anyone has
done this already. I was hoping to find a mention of a "best practice"
somewhere regarding this type of question, but I have not found one yet.
I have a multi-server distributed Solr 3.5 installation behind a load
balancer (haproxy). The application and the load balancer are
completely unaware of the shards parameter- that's handled in Solr.
Here's how I've made that work:
The core with the shards parameter (we refer to it as a broker core)
exists on all servers. There are two servers for chain A and two
servers for chain B. Three of the seven shards live on idxa1/idxb1 and
four of the shards live on idxa2/idxb2. The "shards" parameter on both
chain A servers point only to chain A shards. The same goes for chain B.
The ping handler's health check query contains shards and shards.qt
parameters, so the health check will fail if any of the shards for that
chain are down.
The load balancer has idxa1 and idxb1 as primary equal cost entries. It
has idxa2 and idxb2 as backup entries, with idxa2 having the higher
weight. In normal operation, queries only go to idxa1 and idxb1.
If any shard failure happens on either chain A server, both the idxa1
and idxa2 entries will be marked down by the health check and queries
will only go to chain B.
I can also disable these servers from the load balancer's perspective
using the admin UI. If idxb1 is disabled, all queries will go to idxa1
(which utilizes both idxa1 and idxa2). In that situation, if any chain
A failure were to happen but the chain B shards were all still fine,
idxb2 would still be marked up and the load balancer would send the
queries there.
The two index chains are independently updated - no replication. This
allows me to disable either idxa1 or idxb1 and completely rebuild (or
upgrade) the disabled chain while the other chain remains online. I can
then switch and do the same thing to the other chain, and the
application using Solr has no idea anything has happened.
Thanks,
Shawn