You're going to run into resource issues long before you hit 2G docs/node I suspect. I've seen from 50M t0 300M docs on a single node. Fortunately you may well be near the upper end of that since you're dealing with log files.
Bottom line here is that you're off into largely uncharted territory when you start talking about hundreds of nodes. There's certainly work going on to make that work, but you'd be on the bleeding edge. Best, Erick On Mon, Apr 21, 2014 at 8:55 PM, Zhifeng Wang <zhifeng.wang...@gmail.com> wrote: > Hi, > > We are facing a high incoming rate of usually small documents (logs). The > incoming rate is initially assumed at 2K/sec but could reach as high as > 20K/sec. So a year's worth of data could reach 60G (assuming the rate at > 2K/sec) searchable documents. > > Since a single shard can contain no more than 2G documents, we will need at > least 30 shards per year. Considering that we don't want to have shards to > their maximum capacity, the shards we need will be considerably higher. > > My question is whether there is a hard (not possible) or soft (bad > performance) limit on the number of shards per SolrCloud. ZooKeeper > defaults file size to 1M, so I guess that causes some limit. If I set the > value to a larger number, will SolrCloud really scales OK if there > thousands of shards? Or I would be better off using multiple SolrCloud to > handle the data (Result aggregation is done outside of SolrCloud)? > > Thanks, > Zhifeng