On 4/30/2013 8:33 AM, Furkan KAMACI wrote:
I think that replication occurs after commit by default. It has been long
time however there is still mismatch between leader and replica
(approximately 5 MB). I tried to pull indexes from leader but it is still
same.

My mail server has been down most of the day, and the Apache mail infrastructure hasn't noticed yet that I'm back up. I don't have copies of the newest messages on this thread. I checked the web archive to see what else has been said. I'll be repeating some of what has been said before.

On SolrCloud terminology: SolrCloud divides your index into one or more shards, each of which has a different piece of the index. Each shard is made up of replicas. One replica in each shard is designated leader. Note: a leader is still a replica, it is just the winner of the latest leader election. Summary: shards, replicas, leader.

One term that you are using is "follower" ... this is not a valid SolrCloud term. It might make sense to use this term for a replica that is not a leader, but I have never seen it used in anything official. Any replica can become leader, if the conditions are just right.

There are only two times that the leader replica has special significance - when you are indexing and when a replica starts operation, either as an existing replica that went down or as a new replica.

In SolrCloud, replication is *NOT* used when you index new data. The *ONLY* time that replication happens in SolrCloud is when a replica is starts up, and even then it will only happen if the leader cannot figure out how to use its transaction log to sync the replica.

SolrCloud does distributed indexing. This means that when an update comes in, SolrCloud determines which shard needs that update. If the core that received the request is not the leader of that shard, the request is forwarded to the correct leader. That leader will index the update and send it to all of the replicas for that shard, each of which will index the update independently.

Because each replica indexes independently, you can end up with different sizes. The actual search results should be the same, although scoring can sometimes be a little bit different between replicas because deleted documents that exist in one replica but not another will contribute to the score. SolrCloud does not attempt to keep the replicas absolutely identical, as long as they contain the same non-deleted documents.

Thanks,
Shawn

Reply via email to