Re: Big SolrCloud cluster with a lot of collections

yura last Sun, 16 Aug 2015 00:32:13 -0700

Thanks for your answersCurrently I have one machine (6 cores, 148 GB RAM, 2.5 
TB HDD) and I index around 60 million documents for a day - the index size is 
around 26GB.I do have customer-ID today and I use it for the queries. I don't 
split the customers but I get bad performance.
If I will make small collection for each customer then I know to query only 
those collections and I get better performance - the indexes are smaller and 
the Solr don't need to keep the other customers data in the memory. I checked 
it and the performance is much better.
I do have 1 billion documents today but I can't index them - so it is a real 
requirement for today to be ably index 1 billion and keep the data for 90 
days.We want to grow and to support more customers so I want to understand what 
design I need for 10 billions per day.
I will think if I can split the customers to clusters and merge the results 
myself - it is a good idea. Thanks for the advise.
What is better - 1 powerful machine or a few smaller? For example - one machine 
with 12 cores and 256GB 2.5 TB or 5 machines each with 4 cores and 32 GB 0.5 TB?
Thanks,Yuri

     On Saturday, August 15, 2015 5:53 PM, Toke Eskildsen 
<t...@statsbiblioteket.dk> wrote:

 yura last <y_ura_2...@yahoo.com.INVALID> wrote:
> Hi All, I am testing a SolrCloud with many collections. The version is 5.2.1
> and I installed 3 machines – each one with 4 cores and 8 GB Ram.Then I
> created collections with 3 shards and replication factor of 2. It gives me 2
> cores per collection on each machine.I reached almost 900 collections
> and then the cluster was stuck and I couldn’t revive the cluster.

That mirrors what others are reporting.

> As I understand Solr have issues with many collections (thousands).If I
> will use much more machines – does it will give me the ability to create
> tens of thousands of collections or the limit is couple of thousands?

(Caveat: I have no real world experience with high collection count in Solr)

Adding more machines will not really help you as the problem with thousands of 
collections is not hardware power per se, but rather the coordination of them. 
You mention 180K collections below and with the current Solr architecture, I do 
not see that happening.

> I want to build a cluster that will handle 10 billion of documents (currently 
> I
> have 1 billion) per day and to keep the data for 90 days.

Are those real requirements or something somebody hope will come true some 
years down the road? Technology has a habit of catching up and while a 900 
billion document setup is a challenge today, it is probably a lot easier in 5 
years.

When we are discussion this, it would help if you could also approximate the 
index size in bytes. How large do you expect the sum of shards for 1 billion of 
your documents to be? Likewise, which kind of queries do you expect? Grouping? 
Faceting? All these things multiply.

Anyway, your requirements are in a league where there is not much collective 
experience. You will definitely have to build a serious prototype or three to 
get a proper idea of how much power you need: The standard advices for scaling 
Solr does not make economical sense beyond a point. But you seem to have 
started that process already with your current tests.

> I want to support 2000 customers so I would like to split them to collections
> and also to split it by days. (180,000 collections) 

As 180,000 collections currently seems infeasible for a single SolrCloud, you 
should consider alternatives:

1) If your collections are independent, then build fully independent clusters 
of machines.

2) Don't use collections for dividing data between your customers. Use a field 
with a customer-ID or something like that.

> If I will create big collections I will have performance issues with queries
> and also most of the queries are for a specific customer.

Why would many smaller collections have better performance than fewer larger 
collections?

> (I also have cross customers queries)

If you make independent setups, that could be solved by querying them 
independently and do the merging yourself.

- Toke Eskildsen

Re: Big SolrCloud cluster with a lot of collections

Reply via email to