Well, it's really impossible to say, you have to prototype. Here's something
explaining this a bit:
https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

This is a major undertaking. Your question is simply impossible to
answer without prototyping as in
the link above, anything else is guesswork. And at this scale being
wrong is expensive.

So my advice would be to test on a small "cluster", say a 2 shard
system and see what kind of
performance you can get and extrapolate from there, with your data,
your queries etc. Perhaps
work with your client on a limited-scope proof-of-concept. Plan on
spending some time tuning
even the small cluster to get enough answers to form a go/no-go decision.

Best,
Erick


On Tue, Feb 17, 2015 at 4:40 PM, Dominique Bejean
<dominique.bej...@eolya.fr> wrote:
> One of our customers needs to index 15 billions document in a collection.
> As this volume is not usual for me, I need some advices about solrcloud
> sizing (how much servers, nodes, shards, replicas, memory, ...)
>
> Some inputs :
>
>    - Collection size : 15 billions document
>    - Collection update : 8 millions new documents / days + 8 millions
>    deleted documents / days
>    - Updates occur during the night without queries
>    - Queries occur during the day without updates
>    - Document size is nearly 300 bytes
>    - Document fields are mainly string including one date field
>    - The same terms will occurs several time for a given field (from 10 to
>    100.000)
>    - Query will use a date period and a filter query on one or more fields
>    - 10.000 queries / minutes
>    - expected response time < 500ms
>    - 1 billion documents indexed = 5Gb index size
>    - no ssd drives
>
> So, what is you advice about :
>
> # of shards : 15 billions documents -> 16 shards ?
> # of replicas ?
> # of nodes = # of shards ?
> heap memory per node ?
> direct memory per node ?
>
> Thank your advices ?
>
> Dominique

Reply via email to