Well, it's really impossible to say, you have to prototype. Here's something explaining this a bit: https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
This is a major undertaking. Your question is simply impossible to answer without prototyping as in the link above, anything else is guesswork. And at this scale being wrong is expensive. So my advice would be to test on a small "cluster", say a 2 shard system and see what kind of performance you can get and extrapolate from there, with your data, your queries etc. Perhaps work with your client on a limited-scope proof-of-concept. Plan on spending some time tuning even the small cluster to get enough answers to form a go/no-go decision. Best, Erick On Tue, Feb 17, 2015 at 4:40 PM, Dominique Bejean <dominique.bej...@eolya.fr> wrote: > One of our customers needs to index 15 billions document in a collection. > As this volume is not usual for me, I need some advices about solrcloud > sizing (how much servers, nodes, shards, replicas, memory, ...) > > Some inputs : > > - Collection size : 15 billions document > - Collection update : 8 millions new documents / days + 8 millions > deleted documents / days > - Updates occur during the night without queries > - Queries occur during the day without updates > - Document size is nearly 300 bytes > - Document fields are mainly string including one date field > - The same terms will occurs several time for a given field (from 10 to > 100.000) > - Query will use a date period and a filter query on one or more fields > - 10.000 queries / minutes > - expected response time < 500ms > - 1 billion documents indexed = 5Gb index size > - no ssd drives > > So, what is you advice about : > > # of shards : 15 billions documents -> 16 shards ? > # of replicas ? > # of nodes = # of shards ? > heap memory per node ? > direct memory per node ? > > Thank your advices ? > > Dominique