Thanks guys for all the responses.

True. What I wanted to convey is  2 shards with 4 replicas.

>> use more shards if the query latency is too high.

Shouldn't we go for more replicas if query latency is too high? You can go for 
more shard if you have number of indexing documents and at a much frequent 
rate. Do you disagree with my point of view?

There are no facets but complex queries exist. A safe bet is to have 2 shards 
is what I was thinking so I give enough breathing space for the indexing jobs 
and 4 replicas to address the high QPS request. Am I thinking correctly?

I cannot thank you enough you guys!!

Thank you,
Aswath NS


-----Original Message-----
From: Jack Krupansky [mailto:jack.krupan...@gmail.com]
Sent: Friday, January 22, 2016 3:06 PM
To: solr-user@lucene.apache.org
Subject: Re: Taking Solr to production

"1 Leader & 3 Replicas"

SolrCloud does not distinguish leaders from replicas - that's old master-slave 
terminology. The leader is just one of the replicas.

So, are you really talking about 2 shards with 4 replicas each or 2 shards with 
2 replicas each?

Putting multiple replica instances on each machine isn't buying you anything, 
just making it more complicated to manage.

Number of shards is determined by amount of data and whether query latency can 
be achieved - use more shards if the query latency is too high.

2.5 million (2,500,000) documents is rather small, so unless your queries are 
running really slow, it's not clear you even need sharding, but we don't know 
your document and query complexity. Heavy faceting or complex function queries?

Number of replicas is determined by query load - number of simultaneous query 
requests, as well as HA availability requirements.




-- Jack Krupansky

On Fri, Jan 22, 2016 at 5:45 PM, Toke Eskildsen
wrote:

> Aswath Srinivasan (TMS) wrote:
> > * Totally about 2.5 million documents to be indexed
> > * Documents average size is 512 KB - pdfs and htmls
>
> > This being said I was thinking I would take the Solr to production with,
> > * 2 shards, 1 Leader & 3 Replicas
>
> > Do you all think this set up will work? Will this server me 150 QPS?
>
> It certainly helps that you are batch updating. What is missing in
> this estimation is how large the documents are when indexed, as I
> guess the ½MB average is for the raw files? If they are your everyday
> short PDFs with images, meaning not a lot of text, handling 2M+ of
> them is easy. If they are all full-length books, it is another matter.
>
> Your document count is relatively low and if your index data end up
> being not-too-big (let's say 100GB), then you ought to consider having
> just a single shard with 4 replicas: There is a non-trivial overhead
> going from 1 shard to more than one, especially if you are doing faceting.
>
> - Toke Eskildsen
>

Reply via email to