Thanks guys for your inputs I would be looking at around 100 Tb of total
 index size  with 5100 million documents  for  a period of  30 days before
we purge the  indexes.I had estimated it slightly on the  higher side of
things but that's where I feel we would be.

Thanks,
Nishanth

On Wed, Jan 7, 2015 at 7:50 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 1/7/2015 7:14 PM, Nishanth S wrote:
> > Thanks Shawn and Walter.Yes those are 12,000 writes/second.Reads  for the
> > moment would be in the 1000 reads/second. Guess finding out the right
> > number  of  shards would be my starting point.
>
> I don't think indexing 12000 docs per second would be too much for Solr
> to handle, as long as you architect the indexing application properly.
> You would likely need to have several indexing threads or processes that
> index in parallel.  Solr is fully thread-safe and can handle several
> indexing requests at the same time.  If the indexing application is
> single-threaded, indexing speed will not reach its full potential.
>
> Be aware that indexing at the same time as querying will reduce the
> number of queries per second that you can handle.  In an environment
> where both reads and writes are heavy like you have described, more
> shards and/or more replicas might be required.
>
> For the query side ... even 1000 queries per second is a fairly heavy
> query rate.  You're likely to need at least a few replicas, possibly
> several, to handle that.  The type and complexity of the queries you do
> will make a big difference as well.  To handle that query level, I would
> still recommend only running one shard replica on each server.  If you
> have three shards and three replicas, that means 9 Solr servers.
>
> How many documents will you have in total?  You said they are about 6KB
> each ... but depending on the fieldType definitions (and the analysis
> chain for TextField types), 6KB might be very large or fairly small.
>
> Do you have any idea how large the Solr index will be with all your
> documents?  Estimating that will require indexing a significant
> percentage of your documents with the actual schema and config that you
> will use in production.
>
> If I know how many documents you have, how large the full index will be,
> and can see an example of the more complex queries you will do, I can
> make *preliminary* guesses about the number of shards you might need.  I
> do have to warn you that it will only be a guess.  You'll have to
> experiment to see what works best.
>
> Thanks,
> Shawn
>
>

Reply via email to