On 8/15/2015 2:03 PM, Troy Edwards wrote:
> I am using SolrCloud
> 
> My initial requirements are:
> 
> 1) There are about 6000 clients
> 2) The number of documents from each client are about 500000 (average
> document size is about 400 bytes)
> 3 I have to wipe off the index/collection every night and create new
> 
> Any thoughts/ideas/suggestions on:
> 
> 1) How to index such large number of documents i.e. do I use an http client
> to send documents or is data import handler right or should I try uploading
> CSV files?

This is general info only.

6000 clients, each with half a million docs?  That's 3 billion docs.
There are some users who have more, but this is squarely in the realm of
a HUGE install.

> 2) How many collections should I use?
> 
> 3) How many shards / replicas per collection should I use?

Any answer we came up with for those two questions would involve quite a
few assumptions, any one of which could be wrong.  The only way to
really find out what you need is to set up a prototype system and test
it with real data, real indexing requests, and real queries.  Record the
results of the tests, change the configuration, rebuild the index(es),
and run the tests again.

The number one rule when it comes to Solr performance: Install enough
memory so that all the index data on the server will fit in the
available OS disk cache RAM.  You're going to have a lot of index data.

https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

https://wiki.apache.org/solr/SolrPerformanceProblems

When the number of collections reaches the low hundreds, SolrCloud
stability begins to suffer because of how much interaction with
Zookeeper is required for very small cluster changes.  When there are
thousands of collections, any little problem turns into a nightmare.
Adding more machines doesn't help this particular problem.  Some ideas
are being discussed to make this better, but users won't see the results
of that effort until version 5.4 or 5.5, possibly later.

> 4) Do I need multiple Solr servers?

You would need multiple servers for any hope of redundancy, but the
answer to the question I think you're trying to ask here is yes.
Definitely.  Possibly a LOT of them.

Thanks,
Shawn

Reply via email to