> the incoming document rate could be as high as 20k/second... That sounds like a lot of CPU eager indexing work, given the 128 CPU cores available, from indexing speed perspective: would you recommend having a similar number of solr cores created, or Solr does just a when several with a small number of Solr cores, having several CPU cores per Solr core, as indexing is multi-threaded?
On Mon, Jun 9, 2014 at 7:19 PM, Shawn Heisey <s...@elyograg.org> wrote: > On 6/8/2014 4:17 PM, shushuai zhu wrote: > > I would like to get some advice to setup a Solr Cloud on a set of > powerful machines. The average size of the documents handled by the Solr > Cloud is about 0.5 KB, and the number of documents stored in Solr Cloud > could reach billions. When indexing, the incoming document rate could be as > high as 20k/second; and the major query operations performed on the Cloud > are searching, faceting, and some other aggregations. There will NOT be > many concurrent queries (replication factor of 2 may be good enough), but > some queries could cover big range of documents. > > > > As an example, I have 8 powerful machines (nodes), and each machine > (node) has: > > > > 16 CPU cores > > 256GB RAM > > 48TB physical disk space > > > > The Solr Cloud may be setup in following different ways (assuming > replication factor is 2): > > > > 1) 8 shards on 8 Solr servers, total 16 cores (including replicas) > > Each machine (node) holds one Solr server (JVM), and each Solr server > has one shard. > > > > 2) 32 shards on 8 Solr servers, total 64 cores (including replicas) > > Each machine (node) holds one Solr server (JVM), and each Solr server > has 4 shards. > > > > 3) 32 shards on 16 Solr servers, total 64 cores (including replicas) > > Each machine (node) holds 2 Solr servers (JVMs), and each Solr server > has 2 shards. > > > > 4) 64 shards on 16 Solr servers, total 128 cores (including replicas) > > Each machine (node) holds 2 Solr servers (JVMs), and each Solr server > has 4 shards. > > > > 5) 128 shards on 32 Solr servers, total 256 cores (including replicas) > > Each machine (node) holds 4 Solr servers (JVMs), and each Solr server > has 4 shards. > > Erick's note is very important. From the information given, we can't > even guess about the size of your index. Even if we had that > information, there are too many variables to give you any real > recommendations. > > Also mentioned by Erick: RAM is the single greatest factor affecting > Solr performance. If you have enough OS disk cache to fit your index > entirely in RAM, performance is likely to be excellent. With 256GB of > RAM on eight servers, you're going to have about 2TB of RAM, some of > which will be used for Solr itself. If both copies of your index take > up 2TB or less in disk space, you're probably going to be OK there. > You'd probably be OK up to about 3TB of total index. > > The 48TB of disk space is probably serious overkill. I would assume > this is twelve 4TB drives. It would be better for performance (without > losing redundancy) to use RAID10 with a stripe size of at least 1MB for > the storage instead of any other RAID level. It eats up half your raw > space for redundancy, but the performance is *excellent*. > > The fact that your query volume will be low does give me the ability to > tell you one thing: With 16 CPU cores per machine and a low query > volume, you'll be able to handle a lot more Solr cores per machine. The > extra CPU cores can spend their time reading from Solr cores and > speeding up each individual query without worrying about being crushed > under hundreds of queries per second. > > For a perfect match of CPU cores to Solr cores, you'd do option number > 4, so each machine would get 16 Solr cores ... but I think option number > 3 might be better, so you have more CPUs than indexes per machine. This > gives you a safe capacity of about 32 billion documents, with a maximum > total capacity of well over 64 billion documents. > > Thanks, > Shawn > >