> the incoming document rate could be as high as 20k/second...
That sounds like a lot of CPU eager indexing work, given the 128 CPU cores
available, from indexing speed perspective: would you recommend having a
similar number of solr cores created, or Solr does just a when several with
a small number of Solr cores, having several CPU cores per Solr core, as
indexing is multi-threaded?


On Mon, Jun 9, 2014 at 7:19 PM, Shawn Heisey <s...@elyograg.org> wrote:

> On 6/8/2014 4:17 PM, shushuai zhu wrote:
> > I would like to get some advice to setup a Solr Cloud on a set of
> powerful machines. The average size of the documents handled by the Solr
> Cloud is about 0.5 KB, and the number of documents stored in Solr Cloud
> could reach billions. When indexing, the incoming document rate could be as
> high as 20k/second; and the major query operations performed on the Cloud
> are searching, faceting, and some other aggregations. There will NOT be
> many concurrent queries (replication factor of 2 may be good enough), but
> some queries could cover big range of documents.
> >
> > As an example, I have 8 powerful machines (nodes), and each machine
> (node) has:
> >
> > 16 CPU cores
> > 256GB RAM
> > 48TB physical disk space
> >
> > The Solr Cloud may be setup in following different ways (assuming
> replication factor is 2):
> >
> > 1) 8 shards on 8 Solr servers, total 16 cores (including replicas)
> > Each machine (node) holds one Solr server (JVM), and each Solr server
> has one shard.
> >
> > 2) 32 shards on 8 Solr servers, total 64 cores (including replicas)
> > Each machine (node) holds one Solr server (JVM), and each Solr server
> has 4 shards.
> >
> > 3) 32 shards on 16 Solr servers, total 64 cores (including replicas)
> > Each machine (node) holds 2 Solr servers (JVMs), and each Solr server
> has 2 shards.
> >
> > 4) 64 shards on 16 Solr servers, total 128 cores (including replicas)
> > Each machine (node) holds 2 Solr servers (JVMs), and each Solr server
> has 4 shards.
> >
> > 5) 128 shards on 32 Solr servers, total 256 cores (including replicas)
> > Each machine (node) holds 4 Solr servers (JVMs), and each Solr server
> has 4 shards.
>
> Erick's note is very important.  From the information given, we can't
> even guess about the size of your index.  Even if we had that
> information, there are too many variables to give you any real
> recommendations.
>
> Also mentioned by Erick:  RAM is the single greatest factor affecting
> Solr performance.  If you have enough OS disk cache to fit your index
> entirely in RAM, performance is likely to be excellent.  With 256GB of
> RAM on eight servers, you're going to have about 2TB of RAM, some of
> which will be used for Solr itself.  If both copies of your index take
> up 2TB or less in disk space, you're probably going to be OK there.
> You'd probably be OK up to about 3TB of total index.
>
> The 48TB of disk space is probably serious overkill.  I would assume
> this is twelve 4TB drives.  It would be better for performance (without
> losing redundancy) to use RAID10 with a stripe size of at least 1MB for
> the storage instead of any other RAID level.  It eats up half your raw
> space for redundancy, but the performance is *excellent*.
>
> The fact that your query volume will be low does give me the ability to
> tell you one thing: With 16 CPU cores per machine and a low query
> volume, you'll be able to handle a lot more Solr cores per machine.  The
> extra CPU cores can spend their time reading from Solr cores and
> speeding up each individual query without worrying about being crushed
> under hundreds of queries per second.
>
> For a perfect match of CPU cores to Solr cores, you'd do option number
> 4, so each machine would get 16 Solr cores ... but I think option number
> 3 might be better, so you have more CPUs than indexes per machine.  This
> gives you a safe capacity of about 32 billion documents, with a maximum
> total capacity of well over 64 billion documents.
>
> Thanks,
> Shawn
>
>

Reply via email to