Thanks, Erick.

I wonder whether having multiple SolrClouds helps in this matter. If I have
5 SolrClouds, that will effectively boost the number of shards 5 times. Of
course this introduces issues when I need to aggregate results from
different SolrClouds.

Thanks,
Zhifeng


On Tue, Apr 22, 2014 at 12:30 AM, Erick Erickson <erickerick...@gmail.com>wrote:

> You're going to run into resource issues long before you hit 2G
> docs/node I suspect. I've seen from 50M t0 300M docs on a single node.
> Fortunately you may well be near the upper end of that since you're
> dealing with log files.
>
> Bottom line here is that you're off into largely uncharted territory
> when you start talking about hundreds of nodes. There's certainly work
> going on to make that work, but you'd be on the bleeding edge.
>
> Best,
> Erick
>
> On Mon, Apr 21, 2014 at 8:55 PM, Zhifeng Wang <zhifeng.wang...@gmail.com>
> wrote:
> > Hi,
> >
> > We are facing a high incoming rate of usually small documents (logs). The
> > incoming rate is initially assumed at 2K/sec but could reach as high as
> > 20K/sec. So a year's worth of data could reach 60G (assuming the rate at
> > 2K/sec) searchable documents.
> >
> > Since a single shard can contain no more than 2G documents, we will need
> at
> > least 30 shards per year. Considering that we don't want to have shards
> to
> > their maximum capacity, the shards we need will be considerably higher.
> >
> > My question is whether there is a hard (not possible) or soft (bad
> > performance) limit on the number of shards per SolrCloud. ZooKeeper
> > defaults file size to 1M, so I guess that causes some limit. If I set the
> > value to a larger number, will SolrCloud really scales OK if there
> > thousands of shards?  Or I would be better off using multiple SolrCloud
> to
> > handle the data (Result aggregation is done outside of SolrCloud)?
> >
> > Thanks,
> > Zhifeng
>

Reply via email to