Hi,

We're in the development phase of a new application and the current dev
team mindset leans towards running Solr (4.9) in AWS without Zookeeper. The
theory is that we can add nodes quickly to our load balancer
programmatically and get a dump of the indexes from another node and copy
them over to the new one. A RESTful API would handle other applications
talking to Solr without the need for each of them to have to use SolrJ.
Data ingestion happens nightly in bulk by way of ActiveMQ which each server
subscribes to and pulls its own copy of the indexes. Incremental updates
are very few during the day, but we would have some mechanism of getting a
new server to 'catch up' to the live servers before making it active in the
load balancer.

The only thing so far that I see as a hurdle here is the data set size vs.
heap size. If the index grows too large, then we have to increase the heap
size, which could lead to longer GC times. Servers could pop in and out of
the load balancer if they are unavailable for too long when a major GC
happens.

Current stats:
11 Gb of data (and growing)
4 Gb java heap
4 CPU, 16 Gb RAM nodes (maybe more needed?)

All thoughts are welcomed.

Thanks.
-- 
*Joel Cohen*
Devops Engineer

*GrubHub Inc.*
*jco...@grubhub.com <jco...@grubhub.com>*
646-527-7771
1065 Avenue of the Americas
15th Floor
New York, NY 10018

grubhub.com | *fb <http://www.facebook.com/grubhub>* | *tw
<http://www.twitter.com/grubhub>*
seamless.com | *fb <http://www.facebook.com/seamless>* | *tw
<http://www.twitter.com/seamless>*

Reply via email to