Thanks, Erick. I wonder whether having multiple SolrClouds helps in this matter. If I have 5 SolrClouds, that will effectively boost the number of shards 5 times. Of course this introduces issues when I need to aggregate results from different SolrClouds.
Thanks, Zhifeng On Tue, Apr 22, 2014 at 12:30 AM, Erick Erickson <erickerick...@gmail.com>wrote: > You're going to run into resource issues long before you hit 2G > docs/node I suspect. I've seen from 50M t0 300M docs on a single node. > Fortunately you may well be near the upper end of that since you're > dealing with log files. > > Bottom line here is that you're off into largely uncharted territory > when you start talking about hundreds of nodes. There's certainly work > going on to make that work, but you'd be on the bleeding edge. > > Best, > Erick > > On Mon, Apr 21, 2014 at 8:55 PM, Zhifeng Wang <zhifeng.wang...@gmail.com> > wrote: > > Hi, > > > > We are facing a high incoming rate of usually small documents (logs). The > > incoming rate is initially assumed at 2K/sec but could reach as high as > > 20K/sec. So a year's worth of data could reach 60G (assuming the rate at > > 2K/sec) searchable documents. > > > > Since a single shard can contain no more than 2G documents, we will need > at > > least 30 shards per year. Considering that we don't want to have shards > to > > their maximum capacity, the shards we need will be considerably higher. > > > > My question is whether there is a hard (not possible) or soft (bad > > performance) limit on the number of shards per SolrCloud. ZooKeeper > > defaults file size to 1M, so I guess that causes some limit. If I set the > > value to a larger number, will SolrCloud really scales OK if there > > thousands of shards? Or I would be better off using multiple SolrCloud > to > > handle the data (Result aggregation is done outside of SolrCloud)? > > > > Thanks, > > Zhifeng >