Yes, this is exactly my case. I prefer 3rd option too. As I have 2 more instances to be used for my purposes (SolrCloud4x + 2 more instances for loading) it will be easier to configure zookeeper ensemble (as I can use those 2 additional machines + 1 from SolrCloud) and avoid more instances to be purchased and maintained.
On 22 November 2012 10:18, Luis Cappa Banda <luisca...@gmail.com> wrote: > Hello, > > I´ve been dealing with the same question these days. In architecture terms, > it´s always better to separate services (Solr and Zookeeper, in this case) > rather to keep them in a single instance. However, when we have to deal > with costs issues, all of use we are quite limitated and we must elect the > best architecture/scalable/single point of failure option. As I see, the > options are: > > > *1. *Solr servers with Zookeeper embeded. > *2. *Solr servers with external Zookeeper. > *3.* Solr servers with external Zookeeper ensemble. > > *Note*: as far as I know, the recommended number of Zookeeper services to > avoid single points of failure is:* ZkNum = 2 * Numshards - 1*. If you have > > > The best option is the third one. Reasons: > > *1. *If one of your Solr servers goes down, Zookeeper services still up. > *2.* If one of your Zookeeper services goes down, Solr servers and the rest > of Zookeeper services still up. > > Considering that option, we have two ways to implement it in production: > > *1. *Each service (Solr and Zookeeper) in separate machines. Let´s imagine > that we have 2 shards for a given collection, so we need at least 4 Solr > servers to complete the leader-replica configuration. The best option is to > deploy them in for Amazon instances, one per each server. We need at least > 3 Zookeeper services in a Zookeeper ensemble configuration. The optimal way > to install them is in separates machines (micro instance will be nice for > Zookeeper), so we will have 7 Amazon instances. The reason is that if one > machine goes down (Solr or Zookeeper one) the others services may still up > and your production environment will be safe. However,* for me this is the > best case, but it´s the more expensive one*, so in my case is imposible to > make real. > > *2. *As wee need at least 4 Solr servers and 3 Zookeeper services up, I > would install three Amazon instances with Solr and Zookeeper, and one of > them only with Solr. So we´ll have: 3 complete Amazon instances (Solr + > Zookeeper) and 1 single Amazon instance (only Solr). If one of them goes > down, the production environment will be safe. This architecture is not the > best one, as I told you, but I think that is optimal in terms of > robustness, single point of failure and costs. > > > It would be a pleasure to hear new suggestions from other people that > dealed with this kind of issues. > > Regards, > > > - Luis Cappa. > > > 2012/11/21 Marcin Rzewucki <mrzewu...@gmail.com> > > > Yes, I meant the same (not -zkRun). However, I was asking if it is safe > to > > have zookeeper and solr processes running on the same node or better on > > different machines? > > > > On 21 November 2012 21:18, Rafał Kuć <r....@solr.pl> wrote: > > > > > Hello! > > > > > > As I told I wouldn't use the Zookeeper that is embedded into Solr, but > > > rather setup a standalone one. > > > > > > -- > > > Regards, > > > Rafał Kuć > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - > > ElasticSearch > > > > > > > First of all: thank you for your answers. Yes, I meant side by side > > > > configuration. I think the worst case for ZKs here is to loose two of > > > them. > > > > However, I'm going to use 4 availability zones in same region so at > > least > > > > this will reduce the risk of loosing both of them at the same time. > > > > Regards. > > > > > > > On 21 November 2012 17:06, Rafał Kuć <r....@solr.pl> wrote: > > > > > > >> Hello! > > > >> > > > >> Zookeeper by itself is not demanding, but if something happens to > your > > > >> nodes that have Solr on it, you'll loose ZooKeeper too if you have > > > >> them installed side by side. However if you will have 4 Solr nodes > and > > > >> 3 ZK instances you can get them running side by side. > > > >> > > > >> -- > > > >> Regards, > > > >> Rafał Kuć > > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - > > > ElasticSearch > > > >> > > > >> > Separate is generally nice because then you can restart Solr nodes > > > >> > without consideration for ZooKeeper. > > > >> > > > >> > Performance-wise, I doubt it's a big deal either way. > > > >> > > > >> > - Mark > > > >> > > > >> > On Nov 21, 2012, at 8:54 AM, Marcin Rzewucki <mrzewu...@gmail.com > > > > > >> wrote: > > > >> > > > >> >> Hi, > > > >> >> > > > >> >> I have 4 solr collections, 2-3mn documents per collection, up to > > 100K > > > >> >> updates per collection daily (roughly). I'm going to create > > > SolrCloud4x > > > >> on > > > >> >> Amazon's m1.large instances (7GB mem,2x2.4GHz cpu each). The > > > question is > > > >> >> what about zookeeper? It's going to be external ensemble, but is > it > > > >> better > > > >> >> to use same nodes as solr or dedicated micro instances? Zookeeper > > > does > > > >> not > > > >> >> seem to be resources demanding process, but what would be better > in > > > this > > > >> >> case ? To keep it inside of solrcloud or separately (micro > > instances > > > >> seem > > > >> >> to be enough here) ? > > > >> >> > > > >> >> Thanks in advance. > > > >> >> Regards. > > > >> > > > >> > > > > > > > > > > > > -- > > - Luis Cappa >