Re: SolrCloud recovery

Hendrik Haddorp Fri, 25 Jan 2019 13:59:03 -0800

On a system with about 1600 collections, each having one shard and areplication factor of two it took around an hour to recover completelyafter an instance restart. The setup used HDFS for the storage. And weare using Solr 7.4 at the moment. The overseer queue management helpedus a lot! Before that Solr could easily swirl into death if the queuegrew too fast. I haven't checked the logs on what the recovery does. Isthere anything specific to look for?

During the recovery one can see how Solr is going over the replicas oneby one and never really working on more then about 5 replicas at a time,often less. The progress also seems to be done in alphabetical order. Ibelieve that used to be different in older versions. I will try to givethe coreLoadThreads setting a test.


Hendrik

On 25.01.2019 16:51, Erick Erickson wrote:

That's just _loading_, recovery happens later so I'd
be surprised if this really made a difference, but you
never know.

I'm more interested in _why_ recovery takes so long.
and why recovery happens in the first place. It's normal
for replicas when starting up to to from down->recovering->active,
that's just part of the normal cycle. But the recovering state
should be relatively short absent having to replicate the
index from the leader.

If active indexing is going on, then the replicas may have to
copy their index down from the leader. Does this happen
on a system that is not indexing?

What version of Solr? All the state changes go through
the Overseer, and there were some very significant improvements
in Solr 6.6+, see:
https://issues.apache.org/jira/browse/SOLR-10265

And can you put a number to "rather long"? There's a built-in
3 minute wait for leader election if there's no leader for
a slice. That's not relevant if the replica in recovery
belongs to a shard that already has a leader, but if you
restart your entire cluster it can come into play.

Best,
Erick

On Fri, Jan 25, 2019 at 3:32 AM Hendrik Haddorp <hendrik.hadd...@gmx.net> wrote:

Thanks, that sounds good. Didn't know that parameter.

On 25.01.2019 11:23, Vadim Ivanov wrote:

   You can try to tweak solr.xml


coreLoadThreads
Specifies the number of threads that will be assigned to load cores in parallel.

https://lucene.apache.org/solr/guide/7_6/format-of-solr-xml.html

-----Original Message-----
From: Hendrik Haddorp [mailto:hendrik.hadd...@gmx.net]
Sent: Friday, January 25, 2019 11:39 AM
To: solr-user@lucene.apache.org
Subject: SolrCloud recovery

Hi,

I have a SolrCloud with many collections. When I restart an instance and
the replicas are recovering I noticed that number replicas recovering at
one point is usually around 5. This results in the recovery to take
rather long. Is there a configuration option that controls how many
replicas can recover in parallel?

thanks,
Hendrik

Re: SolrCloud recovery

Reply via email to