It boils down to whether the response rate when you query a single shard is "acceptable", plus some overhead for sharding.
So, if you need 100QPS and all you can get after tuning on a single shard (which you can test with &distrib=false) is 10QPS, you need 10 replicas. But if a single shard can only get you responses back in 10 seconds, you need more shards. And so on.... Best, Erick On Fri, Jan 22, 2016 at 3:30 PM, Aswath Srinivasan (TMS) <aswath.sriniva...@toyota.com> wrote: > Thanks guys for all the responses. > > True. What I wanted to convey is 2 shards with 4 replicas. > >>> use more shards if the query latency is too high. > > Shouldn't we go for more replicas if query latency is too high? You can go > for more shard if you have number of indexing documents and at a much > frequent rate. Do you disagree with my point of view? > > There are no facets but complex queries exist. A safe bet is to have 2 shards > is what I was thinking so I give enough breathing space for the indexing jobs > and 4 replicas to address the high QPS request. Am I thinking correctly? > > I cannot thank you enough you guys!! > > Thank you, > Aswath NS > > > -----Original Message----- > From: Jack Krupansky [mailto:jack.krupan...@gmail.com] > Sent: Friday, January 22, 2016 3:06 PM > To: solr-user@lucene.apache.org > Subject: Re: Taking Solr to production > > "1 Leader & 3 Replicas" > > SolrCloud does not distinguish leaders from replicas - that's old > master-slave terminology. The leader is just one of the replicas. > > So, are you really talking about 2 shards with 4 replicas each or 2 shards > with 2 replicas each? > > Putting multiple replica instances on each machine isn't buying you anything, > just making it more complicated to manage. > > Number of shards is determined by amount of data and whether query latency > can be achieved - use more shards if the query latency is too high. > > 2.5 million (2,500,000) documents is rather small, so unless your queries are > running really slow, it's not clear you even need sharding, but we don't know > your document and query complexity. Heavy faceting or complex function > queries? > > Number of replicas is determined by query load - number of simultaneous query > requests, as well as HA availability requirements. > > > > > -- Jack Krupansky > > On Fri, Jan 22, 2016 at 5:45 PM, Toke Eskildsen > wrote: > >> Aswath Srinivasan (TMS) wrote: >> > * Totally about 2.5 million documents to be indexed >> > * Documents average size is 512 KB - pdfs and htmls >> >> > This being said I was thinking I would take the Solr to production with, >> > * 2 shards, 1 Leader & 3 Replicas >> >> > Do you all think this set up will work? Will this server me 150 QPS? >> >> It certainly helps that you are batch updating. What is missing in >> this estimation is how large the documents are when indexed, as I >> guess the ½MB average is for the raw files? If they are your everyday >> short PDFs with images, meaning not a lot of text, handling 2M+ of >> them is easy. If they are all full-length books, it is another matter. >> >> Your document count is relatively low and if your index data end up >> being not-too-big (let's say 100GB), then you ought to consider having >> just a single shard with 4 replicas: There is a non-trivial overhead >> going from 1 shard to more than one, especially if you are doing faceting. >> >> - Toke Eskildsen >>