Clear out it's tlogs before starting it again may help. - Mark
On Apr 2, 2013, at 10:07 PM, Jamie Johnson <jej2...@gmail.com> wrote: > I brought the bad one down and back up and it did nothing. I can clear the > index and try4.2.1. I will save off the logs and see if there is anything > else odd > On Apr 2, 2013 9:13 PM, "Mark Miller" <markrmil...@gmail.com> wrote: > >> It would appear it's a bug given what you have said. >> >> Any other exceptions would be useful. Might be best to start tracking in a >> JIRA issue as well. >> >> To fix, I'd bring the behind node down and back again. >> >> Unfortunately, I'm pressed for time, but we really need to get to the >> bottom of this and fix it, or determine if it's fixed in 4.2.1 (spreading >> to mirrors now). >> >> - Mark >> >> On Apr 2, 2013, at 7:21 PM, Jamie Johnson <jej2...@gmail.com> wrote: >> >>> Sorry I didn't ask the obvious question. Is there anything else that I >>> should be looking for here and is this a bug? I'd be happy to troll >>> through the logs further if more information is needed, just let me know. >>> >>> Also what is the most appropriate mechanism to fix this. Is it required >> to >>> kill the index that is out of sync and let solr resync things? >>> >>> >>> On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson <jej2...@gmail.com> wrote: >>> >>>> sorry for spamming here.... >>>> >>>> shard5-core2 is the instance we're having issues with... >>>> >>>> Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log >>>> SEVERE: shard update error StdNode: >>>> >> http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException >> : >>>> Server at http://10.38.33.17:7577/solr/dsc-shard5-core2 returned non ok >>>> status:503, message:Service Unavailable >>>> at >>>> >> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373) >>>> at >>>> >> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) >>>> at >>>> >> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332) >>>> at >>>> >> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306) >>>> at >>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >>>> at java.util.concurrent.FutureTask.run(FutureTask.java:138) >>>> at >>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) >>>> at >>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >>>> at java.util.concurrent.FutureTask.run(FutureTask.java:138) >>>> at >>>> >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >>>> at >>>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >>>> at java.lang.Thread.run(Thread.java:662) >>>> >>>> >>>> On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson <jej2...@gmail.com> >> wrote: >>>> >>>>> here is another one that looks interesting >>>>> >>>>> Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log >>>>> SEVERE: org.apache.solr.common.SolrException: ClusterState says we are >>>>> the leader, but locally we don't think so >>>>> at >>>>> >> org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293) >>>>> at >>>>> >> org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228) >>>>> at >>>>> >> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339) >>>>> at >>>>> >> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) >>>>> at >>>>> >> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246) >>>>> at >>>>> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) >>>>> at >>>>> >> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) >>>>> at >>>>> >> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) >>>>> at >>>>> >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) >>>>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) >>>>> at >>>>> >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) >>>>> at >>>>> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) >>>>> >>>>> >>>>> >>>>> On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson <jej2...@gmail.com> >> wrote: >>>>> >>>>>> Looking at the master it looks like at some point there were shards >> that >>>>>> went down. I am seeing things like what is below. >>>>>> >>>>>> NFO: A cluster state change: WatchedEvent state:SyncConnected >>>>>> type:NodeChildrenChanged path:/live_nodes, has occurred - updating... >> (live >>>>>> nodes size: 12) >>>>>> Apr 2, 2013 8:12:52 PM org.apache.solr.common.cloud.ZkStateReader$3 >>>>>> process >>>>>> INFO: Updating live nodes... (9) >>>>>> Apr 2, 2013 8:12:52 PM >> org.apache.solr.cloud.ShardLeaderElectionContext >>>>>> runLeaderProcess >>>>>> INFO: Running the leader process. >>>>>> Apr 2, 2013 8:12:52 PM >> org.apache.solr.cloud.ShardLeaderElectionContext >>>>>> shouldIBeLeader >>>>>> INFO: Checking if I should try and be the leader. >>>>>> Apr 2, 2013 8:12:52 PM >> org.apache.solr.cloud.ShardLeaderElectionContext >>>>>> shouldIBeLeader >>>>>> INFO: My last published State was Active, it's okay to be the leader. >>>>>> Apr 2, 2013 8:12:52 PM >> org.apache.solr.cloud.ShardLeaderElectionContext >>>>>> runLeaderProcess >>>>>> INFO: I may be the new leader - try and sync >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller <markrmil...@gmail.com >>> wrote: >>>>>> >>>>>>> I don't think the versions you are thinking of apply here. Peersync >>>>>>> does not look at that - it looks at version numbers for updates in >> the >>>>>>> transaction log - it compares the last 100 of them on leader and >> replica. >>>>>>> What it's saying is that the replica seems to have versions that the >> leader >>>>>>> does not. Have you scanned the logs for any interesting exceptions? >>>>>>> >>>>>>> Did the leader change during the heavy indexing? Did any zk session >>>>>>> timeouts occur? >>>>>>> >>>>>>> - Mark >>>>>>> >>>>>>> On Apr 2, 2013, at 4:52 PM, Jamie Johnson <jej2...@gmail.com> wrote: >>>>>>> >>>>>>>> I am currently looking at moving our Solr cluster to 4.2 and >> noticed a >>>>>>>> strange issue while testing today. Specifically the replica has a >>>>>>> higher >>>>>>>> version than the master which is causing the index to not replicate. >>>>>>>> Because of this the replica has fewer documents than the master. >> What >>>>>>>> could cause this and how can I resolve it short of taking down the >>>>>>> index >>>>>>>> and scping the right version in? >>>>>>>> >>>>>>>> MASTER: >>>>>>>> Last Modified:about an hour ago >>>>>>>> Num Docs:164880 >>>>>>>> Max Doc:164880 >>>>>>>> Deleted Docs:0 >>>>>>>> Version:2387 >>>>>>>> Segment Count:23 >>>>>>>> >>>>>>>> REPLICA: >>>>>>>> Last Modified: about an hour ago >>>>>>>> Num Docs:164773 >>>>>>>> Max Doc:164773 >>>>>>>> Deleted Docs:0 >>>>>>>> Version:3001 >>>>>>>> Segment Count:30 >>>>>>>> >>>>>>>> in the replicas log it says this: >>>>>>>> >>>>>>>> INFO: Creating new http client, >>>>>>>> >>>>>>> >> config:maxConnectionsPerHost=20&maxConnections=10000&connTimeout=30000&socketTimeout=30000&retry=false >>>>>>>> >>>>>>>> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync >>>>>>>> >>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 >>>>>>>> url=http://10.38.33.17:7577/solrSTART replicas=[ >>>>>>>> http://10.38.33.16:7575/solr/dsc-shard5-core1/] nUpdates=100 >>>>>>>> >>>>>>>> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync >> handleVersions >>>>>>>> >>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url= >>>>>>> http://10.38.33.17:7577/solr >>>>>>>> Received 100 versions from 10.38.33.16:7575/solr/dsc-shard5-core1/ >>>>>>>> >>>>>>>> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync >> handleVersions >>>>>>>> >>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url= >>>>>>> http://10.38.33.17:7577/solr Our >>>>>>>> versions are newer. ourLowThreshold=1431233788792274944 >>>>>>>> otherHigh=1431233789440294912 >>>>>>>> >>>>>>>> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync >>>>>>>> >>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 >>>>>>>> url=http://10.38.33.17:7577/solrDONE. sync succeeded >>>>>>>> >>>>>>>> >>>>>>>> which again seems to point that it thinks it has a newer version of >>>>>>> the >>>>>>>> index so it aborts. This happened while having 10 threads indexing >>>>>>> 10,000 >>>>>>>> items writing to a 6 shard (1 replica each) cluster. Any thoughts >> on >>>>>>> this >>>>>>>> or what I should look for would be appreciated. >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> >>