It boils down to whether the response rate when you query a single
shard is "acceptable", plus some overhead for sharding.

So, if you need 100QPS and all you can get after tuning on a single
shard (which you can test with &distrib=false)
is 10QPS, you need 10 replicas.

But if a single shard can only get you responses back in 10 seconds,
you need more shards.

And so on....

Best,
Erick



On Fri, Jan 22, 2016 at 3:30 PM, Aswath Srinivasan (TMS)
<aswath.sriniva...@toyota.com> wrote:
> Thanks guys for all the responses.
>
> True. What I wanted to convey is  2 shards with 4 replicas.
>
>>> use more shards if the query latency is too high.
>
> Shouldn't we go for more replicas if query latency is too high? You can go 
> for more shard if you have number of indexing documents and at a much 
> frequent rate. Do you disagree with my point of view?
>
> There are no facets but complex queries exist. A safe bet is to have 2 shards 
> is what I was thinking so I give enough breathing space for the indexing jobs 
> and 4 replicas to address the high QPS request. Am I thinking correctly?
>
> I cannot thank you enough you guys!!
>
> Thank you,
> Aswath NS
>
>
> -----Original Message-----
> From: Jack Krupansky [mailto:jack.krupan...@gmail.com]
> Sent: Friday, January 22, 2016 3:06 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Taking Solr to production
>
> "1 Leader & 3 Replicas"
>
> SolrCloud does not distinguish leaders from replicas - that's old 
> master-slave terminology. The leader is just one of the replicas.
>
> So, are you really talking about 2 shards with 4 replicas each or 2 shards 
> with 2 replicas each?
>
> Putting multiple replica instances on each machine isn't buying you anything, 
> just making it more complicated to manage.
>
> Number of shards is determined by amount of data and whether query latency 
> can be achieved - use more shards if the query latency is too high.
>
> 2.5 million (2,500,000) documents is rather small, so unless your queries are 
> running really slow, it's not clear you even need sharding, but we don't know 
> your document and query complexity. Heavy faceting or complex function 
> queries?
>
> Number of replicas is determined by query load - number of simultaneous query 
> requests, as well as HA availability requirements.
>
>
>
>
> -- Jack Krupansky
>
> On Fri, Jan 22, 2016 at 5:45 PM, Toke Eskildsen
> wrote:
>
>> Aswath Srinivasan (TMS) wrote:
>> > * Totally about 2.5 million documents to be indexed
>> > * Documents average size is 512 KB - pdfs and htmls
>>
>> > This being said I was thinking I would take the Solr to production with,
>> > * 2 shards, 1 Leader & 3 Replicas
>>
>> > Do you all think this set up will work? Will this server me 150 QPS?
>>
>> It certainly helps that you are batch updating. What is missing in
>> this estimation is how large the documents are when indexed, as I
>> guess the ½MB average is for the raw files? If they are your everyday
>> short PDFs with images, meaning not a lot of text, handling 2M+ of
>> them is easy. If they are all full-length books, it is another matter.
>>
>> Your document count is relatively low and if your index data end up
>> being not-too-big (let's say 100GB), then you ought to consider having
>> just a single shard with 4 replicas: There is a non-trivial overhead
>> going from 1 shard to more than one, especially if you are doing faceting.
>>
>> - Toke Eskildsen
>>

Reply via email to