Hi I am curious of the influences when have more than 2G docs in a core.And we plan to have 5g docs/core.
Please give me some suggestion about how to plan num of docs in a core ? Thanks. 发自我的 iPhone 在 2014-4-22,12:30,Erick Erickson <erickerick...@gmail.com> 写道: > You're going to run into resource issues long before you hit 2G > docs/node I suspect. I've seen from 50M t0 300M docs on a single node. > Fortunately you may well be near the upper end of that since you're > dealing with log files. > > Bottom line here is that you're off into largely uncharted territory > when you start talking about hundreds of nodes. There's certainly work > going on to make that work, but you'd be on the bleeding edge. > > Best, > Erick > > On Mon, Apr 21, 2014 at 8:55 PM, Zhifeng Wang <zhifeng.wang...@gmail.com> > wrote: >> Hi, >> >> We are facing a high incoming rate of usually small documents (logs). The >> incoming rate is initially assumed at 2K/sec but could reach as high as >> 20K/sec. So a year's worth of data could reach 60G (assuming the rate at >> 2K/sec) searchable documents. >> >> Since a single shard can contain no more than 2G documents, we will need at >> least 30 shards per year. Considering that we don't want to have shards to >> their maximum capacity, the shards we need will be considerably higher. >> >> My question is whether there is a hard (not possible) or soft (bad >> performance) limit on the number of shards per SolrCloud. ZooKeeper >> defaults file size to 1M, so I guess that causes some limit. If I set the >> value to a larger number, will SolrCloud really scales OK if there >> thousands of shards? Or I would be better off using multiple SolrCloud to >> handle the data (Result aggregation is done outside of SolrCloud)? >> >> Thanks, >> Zhifeng