Thank you Erick.

This was also my own opinion.

2015-02-18 7:12 GMT+01:00 Erick Erickson <erickerick...@gmail.com>:

> Well, it's really impossible to say, you have to prototype. Here's
> something
> explaining this a bit:
>
> https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> This is a major undertaking. Your question is simply impossible to
> answer without prototyping as in
> the link above, anything else is guesswork. And at this scale being
> wrong is expensive.
>
> So my advice would be to test on a small "cluster", say a 2 shard
> system and see what kind of
> performance you can get and extrapolate from there, with your data,
> your queries etc. Perhaps
> work with your client on a limited-scope proof-of-concept. Plan on
> spending some time tuning
> even the small cluster to get enough answers to form a go/no-go decision.
>
> Best,
> Erick
>
>
> On Tue, Feb 17, 2015 at 4:40 PM, Dominique Bejean
> <dominique.bej...@eolya.fr> wrote:
> > One of our customers needs to index 15 billions document in a collection.
> > As this volume is not usual for me, I need some advices about solrcloud
> > sizing (how much servers, nodes, shards, replicas, memory, ...)
> >
> > Some inputs :
> >
> >    - Collection size : 15 billions document
> >    - Collection update : 8 millions new documents / days + 8 millions
> >    deleted documents / days
> >    - Updates occur during the night without queries
> >    - Queries occur during the day without updates
> >    - Document size is nearly 300 bytes
> >    - Document fields are mainly string including one date field
> >    - The same terms will occurs several time for a given field (from 10
> to
> >    100.000)
> >    - Query will use a date period and a filter query on one or more
> fields
> >    - 10.000 queries / minutes
> >    - expected response time < 500ms
> >    - 1 billion documents indexed = 5Gb index size
> >    - no ssd drives
> >
> > So, what is you advice about :
> >
> > # of shards : 15 billions documents -> 16 shards ?
> > # of replicas ?
> > # of nodes = # of shards ?
> > heap memory per node ?
> > direct memory per node ?
> >
> > Thank your advices ?
> >
> > Dominique
>

Reply via email to