On 8/28/2013 11:56 AM, Jared Griffith wrote:
What is the recommended way to set up Solr so it's HA and fault tolerant?
I'm assuming it would be the SolrCloud set up. I'm guessing that Example C
(http://wiki.apache.org/solr/SolrCloud) would be the optimum set up. If
so, would one set up a load balancer (like f5 or whatever) to direct
requests to the Zookeeper instances?
Example C has everything on localhost. That's not really redundant. If
you put example C on separate hosts, then it would very likely be redundant.
You do not need (or want) a load balancer for zookeeper. If your Solr
client code is not written in Java, you might want a load balancer for
Solr, though. The java client (SolrJ, specifically the CloudSolrServer
class) doesn't require a load balancer for HA.
For a SolrCloud setup with HA, you need at least three separate physical
hosts. A bare minimum setup has two capable servers that will each run
one copy of Solr and one copy of Zookeeper. The third can be less
capable and run zookeeper only. If you want to run Solr on all three,
you certainly can.
You can also add additional nodes for Solr. Additional zookeeper nodes
are not required, but if you want them, be sure you have an odd number.
You would download zookeeper and follow the instructions to create a
three-node replicated setup:
http://zookeeper.apache.org/doc/r3.4.5/zookeeperStarted.html#sc_RunningReplicatedZooKeeper
For Solr, it's best if you run the latest version, currently 4.4.0. You
can put your zkHost parameter (and other solrcloud parameters) in
solr.xml. Your zkHost parameter should look like the following, where
you use the correct port(s) and a value for the chroot (/mysolr1) that
names your cloud:
server1:2181,server2:2181,server3:2181/mysolr1
A note on the chroot functionality: By using a different chroot value
for each one, you can use one zookeeper ensemble for more than one
SolrCloud. SolrCloud doesn't put much load on zookeeper. If you have
hundreds of Solr nodes that go up and down a lot, the load would be higher.
It's my opinion that you should not use the numShards parameter on the
commandline or in solr.xml, or use the startup options for bootstrapping
a config. I think it's better to use the zkCli "upconfig" option to
upload config sets to zookeeper, and specify the collection.configName,
numShards, and replicationFactor via the Collections API CREATE action.
If you want to go to the freenode IRC system (www.freenode.net) and
joing the #solr channel, you can get more interactive help. I have no
problem sticking with the mailing list either.
Thanks,
Shawn