What version of Solr are you using? What GC parameters are you using? Do
you have GC logs enabled? Look at full GC times in those logs and see
what's happening. This particular problem is usually because replicas
cannot accept the rate of updates and they fall back to recovery state. You
should also check the leader logs to find what kind of exceptions are being
logged.

Also, do multiple shards share the same disk? If yes, then creating so many
shards might not help because the disk will become a bottleneck.

On Sat, Jan 24, 2015 at 3:40 AM, gouthsmsimhadri <gouthamsimha...@gmail.com>
wrote:

> I'm working with a cluster of solr-cloud servers at a configration of 10
> shards and 4 replicas on each shard in stress environment.
> Planned production configuration is 10 shards and 15 replicas on each
> shard.
>
> Current commit settings are as follows
>
>         <autoSoftCommit>
>             <maxDocs>500000</maxDocs>
>             <maxTime>180000</maxTime>
>         </autoSoftCommit>
>
>         <autoCommit>
>             <maxDocs>2000000</maxDocs>
>             <maxTime>180000</maxTime>
>             <openSearcher>false</openSearcher>
>         </autoCommit>
>
>
> The application requires to index approximately 90 Million docs which is
> indexed in two ways
> a)      Full indexing. It takes 4 hours to index 90 Million docs and the
> rate of
> docs coming to the searcher is around 6000 per second
> b)      Incremental indexing. It takes an hour to index delta changes.
> Roughly
> there are 3 million changes and rate of docs coming to the searchers is
> 2500
> per second
>
> I use two collections for example collection1 and collection2
> Each collection has system settings at 12 GB of available RAM and quad core
> Intel(R) Xeon(R) CPU X5570  @ 2.93GHz
>
> Full indexing is always performed on a collection which is not serving live
> traffic and Once job is completed we swap collection so the collection with
> latest data serves traffic and other is inactive.
>
> The other mode of incremental indexing  is performed  always on the
> collection which is serving live traffic.
>
> The problem is in about 10 minutes of indexing is triggered, the replicas
> goes in to recovery mode. This happens on all the shards. In about 20
> minutes or more rest of replicas start to fall into recovery mode. In about
> half an hour all replicas except the leader is in recovery mode.
>
> I cannot throttle the indexing load as that will increase our overall
> indexing time. So to overcome this issue, I remove all the replicas before
> the indexing is started and then add them after the indexing completes.
>
> The behavior(replicas falling into recovery mode) in incremental mode of
> indexing is troublesome as i cannot remove replicas during incremental
> indexing since it serves live traffic, i tried to throttle the speed at
> which documents are indexed but with no success as the cluster still goes
> on
> recovery.
>
> If i let the cluster as is the indexing  eventually completes and also
> recovers after a while, but since this is serving live traffic i just
> cannot
> let these replicas go into recovery mode since it degrades the search
> performance also (from the tests performed).
>
> I tried different commit settings like the below
> a)      No auto soft commit, no auto hard commit and a commit triggered at
> the
> end of indexing
> b)      No auto soft commit, yes auto hard commit and a commit in the end
> of
> indexing
> c)      Yes auto soft commit , no auto hard commit
> d)      Yes auto soft commit , yes auto hard commit
> e)      Different frequency setting for commits for above
>
> Unfortunately all the above yields the same behavior . The replicas still
> goes in recovery
>
> I have increased the zookeeper timeout from 30 seconds to 5 minutes and the
> problem persists.
>
> Is there any setting that would fix this issue ?
>
>
>
>
> -----
>  -goutham
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-Replicas-fall-into-recovery-mode-right-after-update-tp4181706.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards,
Shalin Shekhar Mangar.

Reply via email to