yura last <y_ura_2...@yahoo.com.INVALID> wrote:
> Hi All, I am testing a SolrCloud with many collections. The version is 5.2.1
> and I installed 3 machines – each one with 4 cores and 8 GB Ram.Then I
> created collections with 3 shards and replication factor of 2. It gives me 2
> cores per collection on each machine.I reached almost 900 collections
> and then the cluster was stuck and I couldn’t revive the cluster.

That mirrors what others are reporting.

> As I understand Solr have issues with many collections (thousands).If I
> will use much more machines – does it will give me the ability to create
> tens of thousands of collections or the limit is couple of thousands?

(Caveat: I have no real world experience with high collection count in Solr)

Adding more machines will not really help you as the problem with thousands of 
collections is not hardware power per se, but rather the coordination of them. 
You mention 180K collections below and with the current Solr architecture, I do 
not see that happening.

> I want to build a cluster that will handle 10 billion of documents (currently 
> I
> have 1 billion) per day and to keep the data for 90 days.

Are those real requirements or something somebody hope will come true some 
years down the road? Technology has a habit of catching up and while a 900 
billion document setup is a challenge today, it is probably a lot easier in 5 
years.

When we are discussion this, it would help if you could also approximate the 
index size in bytes. How large do you expect the sum of shards for 1 billion of 
your documents to be? Likewise, which kind of queries do you expect? Grouping? 
Faceting? All these things multiply.

Anyway, your requirements are in a league where there is not much collective 
experience. You will definitely have to build a serious prototype or three to 
get a proper idea of how much power you need: The standard advices for scaling 
Solr does not make economical sense beyond a point. But you seem to have 
started that process already with your current tests.

> I want to support 2000 customers so I would like to split them to collections
> and also to split it by days. (180,000 collections) 

As 180,000 collections currently seems infeasible for a single SolrCloud, you 
should consider alternatives:

1) If your collections are independent, then build fully independent clusters 
of machines.

2) Don't use collections for dividing data between your customers. Use a field 
with a customer-ID or something like that.

> If I will create big collections I will have performance issues with queries
> and also most of the queries are for a specific customer.

Why would many smaller collections have better performance than fewer larger 
collections?

> (I also have cross customers queries)

If you make independent setups, that could be solved by querying them 
independently and do the merging yourself.

- Toke Eskildsen

Reply via email to