Re: huge shards (300GB each) and load balancing

Dmitry Kan Wed, 08 Jun 2011 06:27:57 -0700

Hi Upayavira,

Thanks for sharing insights and experience on this.


As we have 6 shards at the moment, it is pretty hard (=almost impossible) to
keep them on a single box, so that's why we decided to shard. On the other
hand, we have never tried multicore architecture, so that's a good point,
thanks.

On the indexing side, we do it rather straightforward, that is, by updating
the online shards. This should hopefully be improved with [offline update /
http swap] system, as already now, updating online 200GB shards at times
produces OOM, freezing and other issues.



Does someone have other experience / pointers to load balancer software that
was tried with SOLR?

Dmitry

On Wed, Jun 8, 2011 at 12:32 PM, Upayavira <u...@odoko.co.uk> wrote:

>
>
> On Wed, 08 Jun 2011 10:42 +0300, "Dmitry Kan" <dmitry....@gmail.com>
> wrote:
> > Hello list,
> >
> > Thanks for attending to my previous questions so far, have learnt a lot.
> > Here is another one, I hope it will be interesting to answer.
> >
> >
> >
> > We run our SOLR shards and front end SOLR on the Amazon high-end
> > machines.
> > Currently we have 6 shards with around 200GB in each. Currently we have
> > only
> > one front end SOLR which, given a client query, redirects it to all the
> > shards. Our shards are constantly growing, data is at times reindexed (in
> > batches, which is done by removing a decent chunk before replacing it
> > with
> > updated data), constant stream of new data is coming every hour (usually
> > hits the latest shard in time, but can also hit other shards, which have
> > older data). Since the front end SOLR has started to be a SPOF, we are
> > thinking about setting up some sort of load balancer.
> >
> > 1) do you think ELB from Amazon is a good solution for starters? We don't
> > need to maintain sessions between SOLR and client.
> > 2) What other load balancers have been used specifically with SOLR?
> >
> >
> > Overall: does SOLR scale to such size (200GB in an index) and what can be
> > recommended as next step -- resharding (cutting existing shards to
> > smaller
> > chunks), replication?
>
> Really, it is going to be up to you to work out what works in your
> situation. You may be reaching the limit of what a Lucene index can
> handle, don't know. If your query traffic is low, you might find that
> two 100Gb cores in a single instance performs better. But then, maybe
> not! Or two 100Gb shards on smaller Amazon hosts. But then, maybe not!
> :-)
>
> The principal issue with Amazon's load balancers (at least when I was
> using them last year) is that the ports that they balance need to be
> public. You can't use an Amazon load balancer as an internal service
> within a security group. For a service such as Solr, that can be a bit
> of a killer.
>
> If they've fixed that issue, then they'd work fine (I used them quite
> happily in another scenario).
>
> When looking at resolving single points of failure, handling search is
> pretty easy (as you say, stateless load balancer). You will need to give
> more attention though to how you handle it regarding indexing.
>
> Hope that helps a bit!
>
> Upayavira
>
>
>
>
>
> ---
> Enterprise Search Consultant at Sourcesense UK,
> Making Sense of Open Source
>
>

Re: huge shards (300GB each) and load balancing

Reply via email to