Thanks for the input. For how long will the 'old style' of replication be
supported? Is it set to go away in Solr 5? I don't want to be stuck on
using an old version because we designed our application the wrong way.


On Mon, Aug 4, 2014 at 10:22 AM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> Hi Joel,
>
> You're sort of describing the classic replication scenario, which you can
> get started on by reading this:
> http://wiki.apache.org/solr/SolrReplication
>
> Although I believe this is handled in the reference guide, too.
>
> Generally speaking, the sorts of issues you mention are general issues that
> you have to deal with when using Solr at scale, no matter how you
> replicate. Proper GC tuning is a must. You can seriously diminish the
> impact of GC with some tuning.
>
> Etsy has done some interesting things regarding implementing an API that's
> resilient to garbage collecting nodes. Take a look at this:
>
> http://www.lucenerevolution.org/sites/default/files/Living%20with%20Garbage.pdf
>
>
> Michael Della Bitta
>
> Applications Developer
>
> o: +1 646 532 3062
>
> appinions inc.
>
> “The Science of Influence Marketing”
>
> 18 East 41st Street
>
> New York, NY 10017
>
> t: @appinions <https://twitter.com/Appinions> | g+:
> plus.google.com/appinions
> <
> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
> >
> w: appinions.com <http://www.appinions.com/>
>
>
> On Fri, Aug 1, 2014 at 10:48 AM, Joel Cohen <jco...@grubhub.com> wrote:
>
> > Hi,
> >
> > We're in the development phase of a new application and the current dev
> > team mindset leans towards running Solr (4.9) in AWS without Zookeeper.
> The
> > theory is that we can add nodes quickly to our load balancer
> > programmatically and get a dump of the indexes from another node and copy
> > them over to the new one. A RESTful API would handle other applications
> > talking to Solr without the need for each of them to have to use SolrJ.
> > Data ingestion happens nightly in bulk by way of ActiveMQ which each
> server
> > subscribes to and pulls its own copy of the indexes. Incremental updates
> > are very few during the day, but we would have some mechanism of getting
> a
> > new server to 'catch up' to the live servers before making it active in
> the
> > load balancer.
> >
> > The only thing so far that I see as a hurdle here is the data set size
> vs.
> > heap size. If the index grows too large, then we have to increase the
> heap
> > size, which could lead to longer GC times. Servers could pop in and out
> of
> > the load balancer if they are unavailable for too long when a major GC
> > happens.
> >
> > Current stats:
> > 11 Gb of data (and growing)
> > 4 Gb java heap
> > 4 CPU, 16 Gb RAM nodes (maybe more needed?)
> >
> > All thoughts are welcomed.
> >
> > Thanks.
> > --
> > *Joel Cohen*
> > Devops Engineer
> >
> > *GrubHub Inc.*
> > *jco...@grubhub.com <jco...@grubhub.com>*
> > 646-527-7771
> > 1065 Avenue of the Americas
> > 15th Floor
> > New York, NY 10018
> >
> > grubhub.com | *fb <http://www.facebook.com/grubhub>* | *tw
> > <http://www.twitter.com/grubhub>*
> > seamless.com | *fb <http://www.facebook.com/seamless>* | *tw
> > <http://www.twitter.com/seamless>*
> >
>



-- 
*Joel Cohen*
Senior Devops Engineer

*GrubHub Inc.*
*jco...@grubhub.com <jco...@grubhub.com>*
646-527-7771
1065 Avenue of the Americas
15th Floor
New York, NY 10018

grubhub.com | *fb <http://www.facebook.com/grubhub>* | *tw
<http://www.twitter.com/grubhub>*
seamless.com | *fb <http://www.facebook.com/seamless>* | *tw
<http://www.twitter.com/seamless>*

Reply via email to