Hello,

I´ve been dealing with the same question these days. In architecture terms,
it´s always better to separate services (Solr and Zookeeper, in this case)
rather to keep them in a single instance. However, when we have to deal
with costs issues, all of use we are quite limitated and we must elect the
best architecture/scalable/single point of failure option. As I see, the
options are:


*1. *Solr servers with Zookeeper embeded.
*2. *Solr servers with external Zookeeper.
*3.* Solr servers with external Zookeeper ensemble.

*Note*: as far as I know, the recommended number of Zookeeper services to
avoid single points of failure is:* ZkNum = 2 * Numshards - 1*. If you have


The best option is the third one. Reasons:

*1. *If one of your Solr servers goes down, Zookeeper services still up.
*2.* If one of your Zookeeper services goes down, Solr servers and the rest
of Zookeeper services still up.

Considering that option, we have two ways to implement it in production:

*1. *Each service (Solr and Zookeeper) in separate machines. Let´s imagine
that we have 2 shards for a given collection, so we need at least 4 Solr
servers to complete the leader-replica configuration. The best option is to
deploy them in for Amazon instances, one per each server. We need at least
3 Zookeeper services in a Zookeeper ensemble configuration. The optimal way
to install them is in separates machines (micro instance will be nice for
Zookeeper), so we will have 7 Amazon instances. The reason is that if one
machine goes down (Solr or Zookeeper one) the others services may still up
and your production environment will be safe. However,* for me this is the
best case, but it´s the more expensive one*, so in my case is imposible to
make real.

*2. *As wee need at least 4 Solr servers and 3 Zookeeper services up, I
would install three Amazon instances with Solr and Zookeeper, and one of
them only with Solr. So we´ll have: 3 complete Amazon instances (Solr +
Zookeeper) and 1 single Amazon instance  (only Solr). If one of them goes
down, the production environment will be safe. This architecture is not the
best one, as I told you, but I think that is optimal in terms of
robustness, single point of failure and costs.


It would be a pleasure to hear new suggestions from other people that
dealed with this kind of issues.

Regards,


- Luis Cappa.


2012/11/21 Marcin Rzewucki <mrzewu...@gmail.com>

> Yes, I meant the same (not -zkRun). However, I was asking if it is safe to
> have zookeeper and solr processes running on the same node or better on
> different machines?
>
> On 21 November 2012 21:18, Rafał Kuć <r....@solr.pl> wrote:
>
> > Hello!
> >
> > As I told I wouldn't use the Zookeeper that is embedded into Solr, but
> > rather setup a standalone one.
> >
> > --
> > Regards,
> >  Rafał Kuć
> >  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> ElasticSearch
> >
> > > First of all: thank you for your answers. Yes, I meant side by side
> > > configuration. I think the worst case for ZKs here is to loose two of
> > them.
> > > However, I'm going to use 4 availability zones in same region so at
> least
> > > this will reduce the risk of loosing both of them at the same time.
> > > Regards.
> >
> > > On 21 November 2012 17:06, Rafał Kuć <r....@solr.pl> wrote:
> >
> > >> Hello!
> > >>
> > >> Zookeeper by itself is not demanding, but if something happens to your
> > >> nodes that have Solr on it, you'll loose ZooKeeper too if you have
> > >> them installed side by side. However if you will have 4 Solr nodes and
> > >> 3 ZK instances you can get them running side by side.
> > >>
> > >> --
> > >> Regards,
> > >>  Rafał Kuć
> > >>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> > ElasticSearch
> > >>
> > >> > Separate is generally nice because then you can restart Solr nodes
> > >> > without consideration for ZooKeeper.
> > >>
> > >> > Performance-wise, I doubt it's a big deal either way.
> > >>
> > >> > - Mark
> > >>
> > >> > On Nov 21, 2012, at 8:54 AM, Marcin Rzewucki <mrzewu...@gmail.com>
> > >> wrote:
> > >>
> > >> >> Hi,
> > >> >>
> > >> >> I have 4 solr collections, 2-3mn documents per collection, up to
> 100K
> > >> >> updates per collection daily (roughly). I'm going to create
> > SolrCloud4x
> > >> on
> > >> >> Amazon's m1.large instances (7GB mem,2x2.4GHz cpu each). The
> > question is
> > >> >> what about zookeeper? It's going to be external ensemble, but is it
> > >> better
> > >> >> to use same nodes as solr or dedicated micro instances? Zookeeper
> > does
> > >> not
> > >> >> seem to be resources demanding process, but what would be better in
> > this
> > >> >> case ? To keep it inside of solrcloud or separately (micro
> instances
> > >> seem
> > >> >> to be enough here) ?
> > >> >>
> > >> >> Thanks in advance.
> > >> >> Regards.
> > >>
> > >>
> >
> >
>



-- 

- Luis Cappa

Reply via email to