"1 Leader & 3 Replicas" SolrCloud does not distinguish leaders from replicas - that's old master-slave terminology. The leader is just one of the replicas.
So, are you really talking about 2 shards with 4 replicas each or 2 shards with 2 replicas each? Putting multiple replica instances on each machine isn't buying you anything, just making it more complicated to manage. Number of shards is determined by amount of data and whether query latency can be achieved - use more shards if the query latency is too high. 2.5 million (2,500,000) documents is rather small, so unless your queries are running really slow, it's not clear you even need sharding, but we don't know your document and query complexity. Heavy faceting or complex function queries? Number of replicas is determined by query load - number of simultaneous query requests, as well as HA availability requirements. -- Jack Krupansky On Fri, Jan 22, 2016 at 5:45 PM, Toke Eskildsen <t...@statsbiblioteket.dk> wrote: > Aswath Srinivasan (TMS) <aswath.sriniva...@toyota.com> wrote: > > * Totally about 2.5 million documents to be indexed > > * Documents average size is 512 KB - pdfs and htmls > > > This being said I was thinking I would take the Solr to production with, > > * 2 shards, 1 Leader & 3 Replicas > > > Do you all think this set up will work? Will this server me 150 QPS? > > It certainly helps that you are batch updating. What is missing in this > estimation is how large the documents are when indexed, as I guess the ½MB > average is for the raw files? If they are your everyday short PDFs with > images, meaning not a lot of text, handling 2M+ of them is easy. If they > are all full-length books, it is another matter. > > Your document count is relatively low and if your index data end up being > not-too-big (let's say 100GB), then you ought to consider having just a > single shard with 4 replicas: There is a non-trivial overhead going from 1 > shard to more than one, especially if you are doing faceting. > > - Toke Eskildsen >