Thank you Erick. This was also my own opinion.
2015-02-18 7:12 GMT+01:00 Erick Erickson <erickerick...@gmail.com>: > Well, it's really impossible to say, you have to prototype. Here's > something > explaining this a bit: > > https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ > > This is a major undertaking. Your question is simply impossible to > answer without prototyping as in > the link above, anything else is guesswork. And at this scale being > wrong is expensive. > > So my advice would be to test on a small "cluster", say a 2 shard > system and see what kind of > performance you can get and extrapolate from there, with your data, > your queries etc. Perhaps > work with your client on a limited-scope proof-of-concept. Plan on > spending some time tuning > even the small cluster to get enough answers to form a go/no-go decision. > > Best, > Erick > > > On Tue, Feb 17, 2015 at 4:40 PM, Dominique Bejean > <dominique.bej...@eolya.fr> wrote: > > One of our customers needs to index 15 billions document in a collection. > > As this volume is not usual for me, I need some advices about solrcloud > > sizing (how much servers, nodes, shards, replicas, memory, ...) > > > > Some inputs : > > > > - Collection size : 15 billions document > > - Collection update : 8 millions new documents / days + 8 millions > > deleted documents / days > > - Updates occur during the night without queries > > - Queries occur during the day without updates > > - Document size is nearly 300 bytes > > - Document fields are mainly string including one date field > > - The same terms will occurs several time for a given field (from 10 > to > > 100.000) > > - Query will use a date period and a filter query on one or more > fields > > - 10.000 queries / minutes > > - expected response time < 500ms > > - 1 billion documents indexed = 5Gb index size > > - no ssd drives > > > > So, what is you advice about : > > > > # of shards : 15 billions documents -> 16 shards ? > > # of replicas ? > > # of nodes = # of shards ? > > heap memory per node ? > > direct memory per node ? > > > > Thank your advices ? > > > > Dominique >