Rolling restarts work fine for us. I often include installing new configs with that. Here is our script. Pass it any hostname in the cluster. I use the load balancer name. You’ll need to change the domain and the install directory of course.
#!/bin/bash cluster=$1 hosts=`curl -s "http://${cluster}:8983/solr/admin/collections?action=CLUSTERSTATUS&wt=json" | jq -r '.cluster.live_nodes[]' | sort` for host in $hosts do host="${host}.cloud.cheggnet.com" echo restarting Solr on $host ssh $host 'cd /apps/solr6 ; sudo -u bin bin/solr stop; sudo -u bin bin/solr start -cloud -h `hostname`' done Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 20, 2017, at 1:42 PM, Bill Oconnor <bocon...@plos.org> wrote: > > Hello, > > > Background: > > > We have been successfully using Solr for over 5 years and we recently made > the decision to move into SolrCloud. For the most part that has been easy but > we have repeated problems with our rolling restart were server remain > functional but stay in Recovery until they stop trying. We restarted because > we increased the memory from 12GB to 16GB on the JVM. > > > Does anyone have any insight as to what is going on here? > > Is there a special procedure I should use for starting a stopping host? > > Is it ok to do a rolling restart on all the nodes in s shard? > > > Any insight would be appreciated. > > > Configuration: > > > We have a group of servers with multiple collections. Each collection consist > of one shard and multiple replicates. We are running the latest stable > version of SolrClound 6.6 on Ubuntu LTS and Oracle Corporation Java > HotSpot(TM) 64-Bit Server VM 1.8.0_66 25.66-b17 > > > (collection) (shard) (replicates) > > journals_stage -> shard1 -> solr-220 (leader) , solr-223, solr-221, > solr-222 (replicates) > > > Problem: > > > Restarting the system puts the replicates in a recovery state they never exit > from. They eventually give up after 500 tries. If I go to the individual > replicates and execute a query the data is still available. > > > Using tcpdump I find the replicates sending this request to the leader (the > leader appears to be active). > > > The exchange goes like this - : > > > solr-220 is the leader. > > Solr-221 to Solr-220 > > > 10:18:42.426823 IP solr-221:54341 > solr-220:8983: > > > POST /solr/journals_stage_shard1_replica1/update HTTP/1.1 > Content-Type: application/x-www-form-urlencoded; charset=UTF-8 > User-Agent: > Solr[org.apache.solr<http://org.apache.solr/>.client.solrj.impl<http://client.solrj.impl/>.HttpSolrClient] > 1.0 > Content-Length: 108 > Host: solr-220:8983 > Connection: Keep-Alive > > > commit_end_point=true&openSearcher=false&commit=true&softCommit=false&waitSearcher=true&wt=javabin&version=2 > > > Solr-220 back to Solr-221 > > > IP solr-220:8983 > solr-221:54341: Flags [P.], seq 1:5152, ack 385, win 235, > options [nop,nop, > TS val 858155553 ecr 858107069], length 5151 > ..HTTP/1.1 500 Server Error > Content-Type: application/octet-stream > Content-Length: 5060 > > > .responseHeader..&statusT..%QTimeC.%error..#msg?.For input string: > "1578578283947098112".%trace?.&java.lang.NumberFormatException: For > input string: "1578578283947098112" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Integer.parseInt(Integer.java:583) > at java.lang.Integer.parseInt(Integer.java:615) > at > org.apache.lucene.queries.function.docvalues.IntDocValues.getRangeScorer(IntDocValues.java:89) > at > org.apache.solr<http://org.apache.solr/>.search.function.ValueSourceRangeFilter$1.iterator(ValueSourceRangeFilter.java:83) > at > org.apache.solr<http://org.apache.solr/>.search.SolrConstantScoreQuery$ConstantWeight.scorer(SolrConstantScoreQuery.java:100) > at org.apache.lucene.search.Weight.scorerSupplier(Weight.java:126) > at > org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:400) > at > org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:381) > at > org.apache.solr<http://org.apache.solr/>.update.DeleteByQueryWrapper$1.scorer(DeleteByQueryWrapper.java:90) > at > org.apache.lucene.index.BufferedUpdatesStream.applyQueryDeletes(BufferedUpdatesStream.java:709) > > at > org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:267) > >