A collection in SolrCloud is a logical entity that encapsulates documents that confirm to a shared schema. As a distributed system, the data needs to be split and so the collection is logically split into 'Shards'. Shard(s): * don't represent a physical index. * are logical entities
Replica: * is physical manifestation of a shard * is an actual lucene index * therefore, can independently serve requests and accept document updates * Unlike the dictionary meaning, it is not a 'replica' of anything but is just a physical manifestation (I'm repeating this, I know) Moving on, for each shard, there are a few things that need a single controlling point e.g. versioning the incoming documents and maintaining optimistic concurrency. One of the replicas for each shard is given those responsibilities and is called the 'leader'. The leader changes via leader election. I'm not going to go into the details of leader election and when it happens here. All other non-leader replicas (we at times refer to them as followers) receive updates from the leader, who versions the documents. To sum it up, if you are a Java developer, in terms of analogy, collections, and shards are classes but replicas are objects. Imagine a 'wikipedia' collection. It may have 10 shards that split all of wikipedia into 10 parts for the sake of manageability. Depending upon our traffic, we may choose the number of replicas (called replication factor) for each shard. *NOTE*: a replication factor of 1 means, there is 1 replica for each shard i.e. there is ONE physical index for each shard definition. In such a case, this replica would also be the leader. If the replication factor was 2, there would be 2 physical index copies of each shard and one of the 2 would be assigned the role of a leader. Hope this helps. On Wed, Jul 6, 2016 at 2:32 PM, John Doe <mailinglists...@gmail.com> wrote: > Hey, > > I have have the same question on freenode channel , people answered me , > but I believe that I still got doubts. Just because I never had approach to > such data store technologies before it makes me hardly understand what is > exactly is replica and shard in solr. I believe once I understand what > exactly are these two, then I would be able to see the difference. > > According to English dictionary replica is exact copy of something, which > sounds like a true to me, but what is shard then here and how is it > connected with all this context ? Can someone explain this in brief and > give some examples ? > > Thank you in advance > -- Anshum Gupta