Do you have logs right before the following? "we notice that the nodes go into "Recovering" state for about 10-12 hours before finally coming alive."
Is there a peersync failure or something else in the logs indicating why there is a full recovery? Kevin Risden On Wed, Dec 5, 2018 at 12:53 PM lstusr 5u93n4 <lstusr...@gmail.com> wrote: > Hi All, > > We have a collection: > - solr 7.5 > - 3 shards, replication factor 2 for a total of 6 NRT replicas > - 3 servers, 16GB ram each > - 2 billion documents > - autoAddReplicas: false > - 2.1 TB on-disk index size > - index stored on hdfs on separate servers. > > If we (gracefully) shut down solr on all 3 servers, when we re-launch solr > we notice that the nodes go into "Recovering" state for about 10-12 hours > before finally coming alive. > > During this recovery time, we notice high network traffic outbound from our > HDFS servers to our solr servers. The sum total of which is roughly > equivalent to the index size on disk. > > So it seems to us that on startup, solr has to re-read the entire index > before coming back alive. > > 1. is this assumption correct? > 2. is there any way to mitigate this, so that solr can launch faster? > > Thanks! > > Kyle >