The numberformatexception is...odd. Clearly that's too big a number for an integer, did anything in the underlying schema change?
Best, Erick On Wed, Sep 20, 2017 at 3:00 PM, Walter Underwood <wun...@wunderwood.org> wrote: > Rolling restarts work fine for us. I often include installing new configs > with that. Here is our script. Pass it any hostname in the cluster. I use the > load balancer name. You’ll need to change the domain and the install > directory of course. > > #!/bin/bash > > cluster=$1 > > hosts=`curl -s > "http://${cluster}:8983/solr/admin/collections?action=CLUSTERSTATUS&wt=json" > | jq -r '.cluster.live_nodes[]' | sort` > > for host in $hosts > do > host="${host}.cloud.cheggnet.com" > echo restarting Solr on $host > ssh $host 'cd /apps/solr6 ; sudo -u bin bin/solr stop; sudo -u bin > bin/solr start -cloud -h `hostname`' > done > > > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > >> On Sep 20, 2017, at 1:42 PM, Bill Oconnor <bocon...@plos.org> wrote: >> >> Hello, >> >> >> Background: >> >> >> We have been successfully using Solr for over 5 years and we recently made >> the decision to move into SolrCloud. For the most part that has been easy >> but we have repeated problems with our rolling restart were server remain >> functional but stay in Recovery until they stop trying. We restarted because >> we increased the memory from 12GB to 16GB on the JVM. >> >> >> Does anyone have any insight as to what is going on here? >> >> Is there a special procedure I should use for starting a stopping host? >> >> Is it ok to do a rolling restart on all the nodes in s shard? >> >> >> Any insight would be appreciated. >> >> >> Configuration: >> >> >> We have a group of servers with multiple collections. Each collection >> consist of one shard and multiple replicates. We are running the latest >> stable version of SolrClound 6.6 on Ubuntu LTS and Oracle Corporation Java >> HotSpot(TM) 64-Bit Server VM 1.8.0_66 25.66-b17 >> >> >> (collection) (shard) (replicates) >> >> journals_stage -> shard1 -> solr-220 (leader) , solr-223, solr-221, >> solr-222 (replicates) >> >> >> Problem: >> >> >> Restarting the system puts the replicates in a recovery state they never >> exit from. They eventually give up after 500 tries. If I go to the >> individual replicates and execute a query the data is still available. >> >> >> Using tcpdump I find the replicates sending this request to the leader (the >> leader appears to be active). >> >> >> The exchange goes like this - : >> >> >> solr-220 is the leader. >> >> Solr-221 to Solr-220 >> >> >> 10:18:42.426823 IP solr-221:54341 > solr-220:8983: >> >> >> POST /solr/journals_stage_shard1_replica1/update HTTP/1.1 >> Content-Type: application/x-www-form-urlencoded; charset=UTF-8 >> User-Agent: >> Solr[org.apache.solr<http://org.apache.solr/>.client.solrj.impl<http://client.solrj.impl/>.HttpSolrClient] >> 1.0 >> Content-Length: 108 >> Host: solr-220:8983 >> Connection: Keep-Alive >> >> >> commit_end_point=true&openSearcher=false&commit=true&softCommit=false&waitSearcher=true&wt=javabin&version=2 >> >> >> Solr-220 back to Solr-221 >> >> >> IP solr-220:8983 > solr-221:54341: Flags [P.], seq 1:5152, ack 385, win 235, >> options [nop,nop, >> TS val 858155553 ecr 858107069], length 5151 >> ..HTTP/1.1 500 Server Error >> Content-Type: application/octet-stream >> Content-Length: 5060 >> >> >> .responseHeader..&statusT..%QTimeC.%error..#msg?.For input string: >> "1578578283947098112".%trace?.&java.lang.NumberFormatException: For >> input string: "1578578283947098112" >> at >> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) >> at java.lang.Integer.parseInt(Integer.java:583) >> at java.lang.Integer.parseInt(Integer.java:615) >> at >> org.apache.lucene.queries.function.docvalues.IntDocValues.getRangeScorer(IntDocValues.java:89) >> at >> org.apache.solr<http://org.apache.solr/>.search.function.ValueSourceRangeFilter$1.iterator(ValueSourceRangeFilter.java:83) >> at >> org.apache.solr<http://org.apache.solr/>.search.SolrConstantScoreQuery$ConstantWeight.scorer(SolrConstantScoreQuery.java:100) >> at org.apache.lucene.search.Weight.scorerSupplier(Weight.java:126) >> at >> org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:400) >> at org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:381) >> at >> org.apache.solr<http://org.apache.solr/>.update.DeleteByQueryWrapper$1.scorer(DeleteByQueryWrapper.java:90) >> at >> org.apache.lucene.index.BufferedUpdatesStream.applyQueryDeletes(BufferedUpdatesStream.java:709) >> >> at >> org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:267) >> >> >