Hello, I´ve been dealing with the same question these days. In architecture terms, it´s always better to separate services (Solr and Zookeeper, in this case) rather to keep them in a single instance. However, when we have to deal with costs issues, all of use we are quite limitated and we must elect the best architecture/scalable/single point of failure option. As I see, the options are:
*1. *Solr servers with Zookeeper embeded. *2. *Solr servers with external Zookeeper. *3.* Solr servers with external Zookeeper ensemble. *Note*: as far as I know, the recommended number of Zookeeper services to avoid single points of failure is:* ZkNum = 2 * Numshards - 1*. If you have The best option is the third one. Reasons: *1. *If one of your Solr servers goes down, Zookeeper services still up. *2.* If one of your Zookeeper services goes down, Solr servers and the rest of Zookeeper services still up. Considering that option, we have two ways to implement it in production: *1. *Each service (Solr and Zookeeper) in separate machines. Let´s imagine that we have 2 shards for a given collection, so we need at least 4 Solr servers to complete the leader-replica configuration. The best option is to deploy them in for Amazon instances, one per each server. We need at least 3 Zookeeper services in a Zookeeper ensemble configuration. The optimal way to install them is in separates machines (micro instance will be nice for Zookeeper), so we will have 7 Amazon instances. The reason is that if one machine goes down (Solr or Zookeeper one) the others services may still up and your production environment will be safe. However,* for me this is the best case, but it´s the more expensive one*, so in my case is imposible to make real. *2. *As wee need at least 4 Solr servers and 3 Zookeeper services up, I would install three Amazon instances with Solr and Zookeeper, and one of them only with Solr. So we´ll have: 3 complete Amazon instances (Solr + Zookeeper) and 1 single Amazon instance (only Solr). If one of them goes down, the production environment will be safe. This architecture is not the best one, as I told you, but I think that is optimal in terms of robustness, single point of failure and costs. It would be a pleasure to hear new suggestions from other people that dealed with this kind of issues. Regards, - Luis Cappa. 2012/11/21 Marcin Rzewucki <mrzewu...@gmail.com> > Yes, I meant the same (not -zkRun). However, I was asking if it is safe to > have zookeeper and solr processes running on the same node or better on > different machines? > > On 21 November 2012 21:18, Rafał Kuć <r....@solr.pl> wrote: > > > Hello! > > > > As I told I wouldn't use the Zookeeper that is embedded into Solr, but > > rather setup a standalone one. > > > > -- > > Regards, > > Rafał Kuć > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - > ElasticSearch > > > > > First of all: thank you for your answers. Yes, I meant side by side > > > configuration. I think the worst case for ZKs here is to loose two of > > them. > > > However, I'm going to use 4 availability zones in same region so at > least > > > this will reduce the risk of loosing both of them at the same time. > > > Regards. > > > > > On 21 November 2012 17:06, Rafał Kuć <r....@solr.pl> wrote: > > > > >> Hello! > > >> > > >> Zookeeper by itself is not demanding, but if something happens to your > > >> nodes that have Solr on it, you'll loose ZooKeeper too if you have > > >> them installed side by side. However if you will have 4 Solr nodes and > > >> 3 ZK instances you can get them running side by side. > > >> > > >> -- > > >> Regards, > > >> Rafał Kuć > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - > > ElasticSearch > > >> > > >> > Separate is generally nice because then you can restart Solr nodes > > >> > without consideration for ZooKeeper. > > >> > > >> > Performance-wise, I doubt it's a big deal either way. > > >> > > >> > - Mark > > >> > > >> > On Nov 21, 2012, at 8:54 AM, Marcin Rzewucki <mrzewu...@gmail.com> > > >> wrote: > > >> > > >> >> Hi, > > >> >> > > >> >> I have 4 solr collections, 2-3mn documents per collection, up to > 100K > > >> >> updates per collection daily (roughly). I'm going to create > > SolrCloud4x > > >> on > > >> >> Amazon's m1.large instances (7GB mem,2x2.4GHz cpu each). The > > question is > > >> >> what about zookeeper? It's going to be external ensemble, but is it > > >> better > > >> >> to use same nodes as solr or dedicated micro instances? Zookeeper > > does > > >> not > > >> >> seem to be resources demanding process, but what would be better in > > this > > >> >> case ? To keep it inside of solrcloud or separately (micro > instances > > >> seem > > >> >> to be enough here) ? > > >> >> > > >> >> Thanks in advance. > > >> >> Regards. > > >> > > >> > > > > > -- - Luis Cappa