Re: Whether SolrCloud can support 2 TB data?

Yago Riveiro Sat, 24 Sep 2016 07:16:19 -0700

 "LucidWorks achieved 150k docs/second"

This is only valid is you don't have replication, I don't know your use case,
but a realistic use case normally use some type of redundancy to not lost data
in a hardware failure, at least 2 replicas, more implicates a reduction of
throughput. Also don't forget that in an realistic use case you should handle
reads too.  

Our cluster is small for the data we hold (12 machines with SSD and 32G of
RAM), but we don't need sub-second queries, we need facet with high
cardinality (in worst case scenarios we aggregate 5M unique string values)  

As Shawn probably told you, sizing your cluster is a try and error path. Our
cluster is optimize to handle a low rate of reads, facet queries and a high
rate of inserts.  

In a peak of inserts we can handle around 25K docs per second without any
issue with 2 replicas and without compromise reads or put a node in stress.
Nodes in stress can eject him selfs from the Zookepeer cluster due a GC or a
lack of CPU to communicate.  

If you want accuracy data you need to do test.  

Keep in mind the most important thing about solr in my opinion, in a terabyte
scale any field type schema change or LuceneCodec change will force you to do
a full reindex. Each time I need to update Solr to a major release it's a pain
in the ass to convert the segments if are not compatible with newer version.
This can take months, will not ensure your data will be equal that a clean
index (voodoo magic thing can happen, thrust me), and it will drain a huge
amount of hardware resources to do it without downtime.

\--

/Yago Riveiro

![](https://link.nylas.com/open/m7fkqw0yim04itb62itnp7r9/local-277ee09e-
1aee?r=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn)

On Sep 24 2016, at 7:48 am, S G <sg.online.em...@gmail.com> wrote:  

> Hey Yago,

>

> 12 T is very impressive.

>

> Can you also share some numbers about the shards, replicas, machine  
count/specs and docs/second for your case?  
I think you would not be having a single index of 12 TB too. So some  
details on that would be really helpful too.

>

> https://lucidworks.com/blog/2014/06/03/introducing-the-solr-scale-toolkit/  
is a good post how LucidWorks achieved 150k docs/second.  
If you have any such similar blog, that would be quite useful and popular  
too.

>

> \--SG

>

> On Fri, Sep 23, 2016 at 5:00 PM, Yago Riveiro <yago.rive...@gmail.com>  
wrote:

>

> > In my company we have a SolrCloud cluster with 12T.  
>  
> My advices:  
>  
> Be nice with CPU you will needed in some point (very important if you have  
> not control over the kind of queries to the cluster, clients are greedy,  
> the want all results at the same time)  
>  
> SSD and memory (as many as you can afford if you will do facets)  
>  
> Full recoveries are a pain, network it's important and should be as fast  
> as possible, never less than 1Gbit.  
>  
> Divide and conquer, but too much can drive you to an expensive overhead,  
> data travels over the network. Find the sweet point (only testing you use  
> case you will know)  
>  
> \--  
>  
> /Yago Riveiro  
>  
> On 23 Sep 2016, 23:44 +0100, Pushkar Raste <pushkar.ra...@gmail.com>,  
> wrote:  
> > Solr is RAM hungry. Make sure that you have enough RAM to have most if  
> the  
> > index of a core in the RAM itself.  
> >  
> > You should also consider using really good SSDs.  
> >  
> > That would be a good start. Like others said, test and verify your setup.  
> >  
> > \--Pushkar Raste  
> >  
> > On Sep 23, 2016 4:58 PM, "Jeffery Yuan" <yuanyun...@gmail.com> wrote:  
> >  
> > Thanks so much for your prompt reply.  
> >  
> > We are definitely going to use SolrCloud.  
> >  
> > I am just wondering whether SolrCloud can scale even at TB data level and  
> > what kind of hardware configuration it should be.  
> >  
> > Thanks.  
> >  
> >  
> >  
> > \--  
> > View this message in context: [http://lucene.472066.n3.](http://lucene.472
066.n3.&r=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn)  
> > nabble.com/Whether-solr-can-support-2-TB-data-tp4297790p4297800.html  
> > Sent from the Solr - User mailing list archive at Nabble.com.  
>

Re: Whether SolrCloud can support 2 TB data?

Reply via email to