Erick - thanks a lot for answering and sharing the below article, it's very 
helpful !

I have another follow-up question - assuming we have 400 vCPUs across our 
SolrCloud cluster nodes, will it be better to have 400 shards with replication 
factor 2
or 200 shards with replication factor 4 ? What utilizes better the CPUs - 
shards/or their replicas ?

Thanks,
Adi

-----Original Message-----
From: Erick Erickson <erickerick...@gmail.com>
Sent: Thursday, August 1, 2019 6:48 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud recommended I/O RAID level

Yes, I’m exactly talking about the idea of someone pulling the plug on one of 
your machines. RAID doesn’t help you at all with that.

10M docs/shard is pretty low actually, depending on the characteristics of docs 
of course. I’ve ween 10M docs strain things. I’ve seen 300M docs fit in 16G 
heap. My personal straw-man starting point is 50M docs/shard. You have to test. 
Old but still valid: 
https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

bq. Will additional replicas will assist with QPS

It Depends (tm). The article I linked above will help you answer the question, 
the only way I know of to answer definitively is to stress-test on a small (2 
nodes will do) cluster. Obviously if you’ve maxed out all your CPUs with no 
replicas, adding more replicas won’t help. But first you have to convince me 
that you’re maxing out all your CPUs.

Test, test, test.

Best,
Erick

> On Aug 1, 2019, at 7:35 AM, Kaminski, Adi <adi.kamin...@verint.com> wrote:
>
> Hi Erick,
>
> Thanks for the detailed explanation, much appreciated.
>
>
>
> Regarding #1 reason you have mentioned - but if RAID10 is chosen per
> Shawn's suggestion, then we should be protected, no ? Unless you mean the 
> whole server is not available and not specific disk of configured RAID10.
>
>
>
> Regarding #2 reason you have mentioned -
>
> ·       If we’re having 7 Solr node/servers as part of SolrCloud cluster, 
> with 48 vCPU in each server (so in total ~336 vCPU in total)
>
> ·       And we are talking about 2-3.2B of Solr docs distributed on this 
> cluster
>
> ·       And assuming that each shard will handled 10M Solr docs in average 
> (that what we saw is some forums as basic “rule of thumb”)
>
> ·       Questions:
>
> o   Will additional replicas will assist with QPS ? As we will basically are 
> having ratio of 1 vCPU per 1 shard (the mster replica)
>
> o   Wouldn’t additional replicas will overclock per total amout of vCPUs, and 
> actually cause delays in queries and lower QPS ?
>
>
>
> Thanks,
>
> Adi
>
>
>
> -----Original Message-----
> From: Erick Erickson <erickerick...@gmail.com>
> Sent: Thursday, August 1, 2019 2:03 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SolrCloud recommended I/O RAID level
>
>
>
> “why would I need a replication factor of 2….”
>
>
>
> Two reasons:
>
>
>
> 1> assuming your replicas are located on different physical machines, you can 
> lose one and still keep on running. RAID won’t help if the machine dies.
>
>
>
> 2> capacity. Depending on your query load, having more than one replica 
> distributes query loads across multiple Solr instances, increasing the QPS 
> you can handle.
>
>
>
> If you don’t care about either of those issues, then no reason.
>
>
>
> Best,
>
> Erick
>
>
>
>> On Aug 1, 2019, at 2:11 AM, Kaminski, Adi 
>> <adi.kamin...@verint.com<mailto:adi.kamin...@verint.com>> wrote:
>
>>
>
>> Hi Shawn,
>
>> Thanks for your reply, fully agree with your comments, it clarifies more the 
>> need of RAID10 in this case.
>
>>
>
>> One additional follow-up question - in case we follow this guidelines
>
>> and having RAID10 (which leaves us with effective capacity of 50%),
>
>> why would I need replication factor of 2 in our SolrCloud core/collection ? 
>> Won't it be double protection layer, while the IO layer mirroring of RAID10 
>> actually brings the value, and no need to copy anything when we have IO 
>> failures ?
>
>>
>
>> Thanks,
>
>> Adi
>
>>
>
>> -----Original Message-----
>
>> From: Shawn Heisey <apa...@elyograg.org<mailto:apa...@elyograg.org>>
>
>> Sent: Tuesday, July 30, 2019 9:44 PM
>
>> To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
>
>> Subject: Re: SolrCloud recommended I/O RAID level
>
>>
>
>> On 7/30/2019 12:12 PM, Kaminski, Adi wrote:
>
>>> Indeed RAID10 with both mirroring and striping should satisfy the
>
>>> need, but per some benchmarks in the network there is still an
>>> impact
>
>>> on write performance on it compared to RAID0 which is considered as
>
>>> much better (attaching a table that summarizes different RAID levels
>
>>> and their pros/cons and capacity ratio).
>
>>
>
>> RAID10 offers the best combination of performance and reliability.
>
>> RAID0 might beat it *slightly* on performance, but if ANY drive fails on 
>> RAID0, the entire volume is lost.
>
>>
>
>>> If we have ~200-320 shards spread by our 7 Solr node servers (part
>>> of
>
>>> SolrCloud cluster) on single core/collection configured with
>
>>> replication factor 2, shouldn't it supply applicative level
>
>>> redundancy of indexed data ?
>
>>
>
>> Yes, you could rely on Solr alone for data redundancy.  But if there's a 
>> drive failure, do you REALLY want to be single-stranded for the time it 
>> takes to rebuild the entire server and copy data?  That's what you would end 
>> up doing if you choose RAID0.
>
>>
>
>> It is true that RAID1 or RAID10 means you have to buy double your usable 
>> capacity.  I would argue that drives are cheap and will cost less than 
>> either downtime or sysadmin effort.
>
>>
>
>> Thanks,
>
>> Shawn
>
>>
>
>>
>
>> This electronic message may contain proprietary and confidential information 
>> of Verint Systems Inc., its affiliates and/or subsidiaries. The information 
>> is intended to be for the use of the individual(s) or entity(ies) named 
>> above. If you are not the intended recipient (or authorized to receive this 
>> e-mail for the intended recipient), you may not use, copy, disclose or 
>> distribute to anyone this message or any information contained in this 
>> message. If you have received this electronic message in error, please 
>> notify us by replying to this e-mail.
>
>
>
>
> This electronic message may contain proprietary and confidential information 
> of Verint Systems Inc., its affiliates and/or subsidiaries. The information 
> is intended to be for the use of the individual(s) or entity(ies) named 
> above. If you are not the intended recipient (or authorized to receive this 
> e-mail for the intended recipient), you may not use, copy, disclose or 
> distribute to anyone this message or any information contained in this 
> message. If you have received this electronic message in error, please notify 
> us by replying to this e-mail.



This electronic message may contain proprietary and confidential information of 
Verint Systems Inc., its affiliates and/or subsidiaries. The information is 
intended to be for the use of the individual(s) or entity(ies) named above. If 
you are not the intended recipient (or authorized to receive this e-mail for 
the intended recipient), you may not use, copy, disclose or distribute to 
anyone this message or any information contained in this message. If you have 
received this electronic message in error, please notify us by replying to this 
e-mail.

Reply via email to