Shawn thanks for the detailed answer. I have 5 shards and 1 leader - 1 replica for each. I mean I have 10 Solr nodes. When I look at admin gui of one of the shards leader I see that its replica has less MB of index than leader. I don't update the data, I don't index new ones. I think that after a time later it will sync its replica to itself but nothing has changed.
2013/5/1 Shawn Heisey <s...@elyograg.org> > On 4/30/2013 8:33 AM, Furkan KAMACI wrote: > >> I think that replication occurs after commit by default. It has been long >> time however there is still mismatch between leader and replica >> (approximately 5 MB). I tried to pull indexes from leader but it is still >> same. >> > > My mail server has been down most of the day, and the Apache mail > infrastructure hasn't noticed yet that I'm back up. I don't have copies of > the newest messages on this thread. I checked the web archive to see what > else has been said. I'll be repeating some of what has been said before. > > On SolrCloud terminology: SolrCloud divides your index into one or more > shards, each of which has a different piece of the index. Each shard is > made up of replicas. One replica in each shard is designated leader. Note: > a leader is still a replica, it is just the winner of the latest leader > election. Summary: shards, replicas, leader. > > One term that you are using is "follower" ... this is not a valid > SolrCloud term. It might make sense to use this term for a replica that is > not a leader, but I have never seen it used in anything official. Any > replica can become leader, if the conditions are just right. > > There are only two times that the leader replica has special significance > - when you are indexing and when a replica starts operation, either as an > existing replica that went down or as a new replica. > > In SolrCloud, replication is *NOT* used when you index new data. The > *ONLY* time that replication happens in SolrCloud is when a replica is > starts up, and even then it will only happen if the leader cannot figure > out how to use its transaction log to sync the replica. > > SolrCloud does distributed indexing. This means that when an update comes > in, SolrCloud determines which shard needs that update. If the core that > received the request is not the leader of that shard, the request is > forwarded to the correct leader. That leader will index the update and > send it to all of the replicas for that shard, each of which will index the > update independently. > > Because each replica indexes independently, you can end up with different > sizes. The actual search results should be the same, although scoring can > sometimes be a little bit different between replicas because deleted > documents that exist in one replica but not another will contribute to the > score. SolrCloud does not attempt to keep the replicas absolutely > identical, as long as they contain the same non-deleted documents. > > Thanks, > Shawn > >