Something interesting that I'm noticing as well, I just indexed 300,000 items, and some how 300,020 ended up in the index. I thought perhaps I messed something up so I started the indexing again and indexed another 400,000 and I see 400,064 docs. Is there a good way to find possibile duplicates? I had tried to facet on key (our id field) but that didn't give me anything with more than a count of 1.
On Wed, Apr 3, 2013 at 9:22 AM, Jamie Johnson <jej2...@gmail.com> wrote: > Ok, so clearing the transaction log allowed things to go again. I am > going to clear the index and try to replicate the problem on 4.2.0 and then > I'll try on 4.2.1 > > > On Wed, Apr 3, 2013 at 8:21 AM, Mark Miller <markrmil...@gmail.com> wrote: > >> No, not that I know if, which is why I say we need to get to the bottom >> of it. >> >> - Mark >> >> On Apr 2, 2013, at 10:18 PM, Jamie Johnson <jej2...@gmail.com> wrote: >> >> > Mark >> > It's there a particular jira issue that you think may address this? I >> read >> > through it quickly but didn't see one that jumped out >> > On Apr 2, 2013 10:07 PM, "Jamie Johnson" <jej2...@gmail.com> wrote: >> > >> >> I brought the bad one down and back up and it did nothing. I can clear >> >> the index and try4.2.1. I will save off the logs and see if there is >> >> anything else odd >> >> On Apr 2, 2013 9:13 PM, "Mark Miller" <markrmil...@gmail.com> wrote: >> >> >> >>> It would appear it's a bug given what you have said. >> >>> >> >>> Any other exceptions would be useful. Might be best to start tracking >> in >> >>> a JIRA issue as well. >> >>> >> >>> To fix, I'd bring the behind node down and back again. >> >>> >> >>> Unfortunately, I'm pressed for time, but we really need to get to the >> >>> bottom of this and fix it, or determine if it's fixed in 4.2.1 >> (spreading >> >>> to mirrors now). >> >>> >> >>> - Mark >> >>> >> >>> On Apr 2, 2013, at 7:21 PM, Jamie Johnson <jej2...@gmail.com> wrote: >> >>> >> >>>> Sorry I didn't ask the obvious question. Is there anything else >> that I >> >>>> should be looking for here and is this a bug? I'd be happy to troll >> >>>> through the logs further if more information is needed, just let me >> >>> know. >> >>>> >> >>>> Also what is the most appropriate mechanism to fix this. Is it >> >>> required to >> >>>> kill the index that is out of sync and let solr resync things? >> >>>> >> >>>> >> >>>> On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson <jej2...@gmail.com> >> >>> wrote: >> >>>> >> >>>>> sorry for spamming here.... >> >>>>> >> >>>>> shard5-core2 is the instance we're having issues with... >> >>>>> >> >>>>> Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log >> >>>>> SEVERE: shard update error StdNode: >> >>>>> >> >>> >> http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException >> >>> : >> >>>>> Server at http://10.38.33.17:7577/solr/dsc-shard5-core2 returned >> non >> >>> ok >> >>>>> status:503, message:Service Unavailable >> >>>>> at >> >>>>> >> >>> >> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373) >> >>>>> at >> >>>>> >> >>> >> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) >> >>>>> at >> >>>>> >> >>> >> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332) >> >>>>> at >> >>>>> >> >>> >> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306) >> >>>>> at >> >>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> >>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:138) >> >>>>> at >> >>>>> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) >> >>>>> at >> >>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> >>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:138) >> >>>>> at >> >>>>> >> >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> >>>>> at >> >>>>> >> >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> >>>>> at java.lang.Thread.run(Thread.java:662) >> >>>>> >> >>>>> >> >>>>> On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson <jej2...@gmail.com> >> >>> wrote: >> >>>>> >> >>>>>> here is another one that looks interesting >> >>>>>> >> >>>>>> Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log >> >>>>>> SEVERE: org.apache.solr.common.SolrException: ClusterState says we >> are >> >>>>>> the leader, but locally we don't think so >> >>>>>> at >> >>>>>> >> >>> >> org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293) >> >>>>>> at >> >>>>>> >> >>> >> org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228) >> >>>>>> at >> >>>>>> >> >>> >> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339) >> >>>>>> at >> >>>>>> >> >>> >> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) >> >>>>>> at >> >>>>>> >> >>> >> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246) >> >>>>>> at >> >>>>>> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) >> >>>>>> at >> >>>>>> >> >>> >> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) >> >>>>>> at >> >>>>>> >> >>> >> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) >> >>>>>> at >> >>>>>> >> >>> >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) >> >>>>>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) >> >>>>>> at >> >>>>>> >> >>> >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) >> >>>>>> at >> >>>>>> >> >>> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson <jej2...@gmail.com> >> >>> wrote: >> >>>>>> >> >>>>>>> Looking at the master it looks like at some point there were >> shards >> >>> that >> >>>>>>> went down. I am seeing things like what is below. >> >>>>>>> >> >>>>>>> NFO: A cluster state change: WatchedEvent state:SyncConnected >> >>>>>>> type:NodeChildrenChanged path:/live_nodes, has occurred - >> >>> updating... (live >> >>>>>>> nodes size: 12) >> >>>>>>> Apr 2, 2013 8:12:52 PM >> org.apache.solr.common.cloud.ZkStateReader$3 >> >>>>>>> process >> >>>>>>> INFO: Updating live nodes... (9) >> >>>>>>> Apr 2, 2013 8:12:52 PM >> >>> org.apache.solr.cloud.ShardLeaderElectionContext >> >>>>>>> runLeaderProcess >> >>>>>>> INFO: Running the leader process. >> >>>>>>> Apr 2, 2013 8:12:52 PM >> >>> org.apache.solr.cloud.ShardLeaderElectionContext >> >>>>>>> shouldIBeLeader >> >>>>>>> INFO: Checking if I should try and be the leader. >> >>>>>>> Apr 2, 2013 8:12:52 PM >> >>> org.apache.solr.cloud.ShardLeaderElectionContext >> >>>>>>> shouldIBeLeader >> >>>>>>> INFO: My last published State was Active, it's okay to be the >> leader. >> >>>>>>> Apr 2, 2013 8:12:52 PM >> >>> org.apache.solr.cloud.ShardLeaderElectionContext >> >>>>>>> runLeaderProcess >> >>>>>>> INFO: I may be the new leader - try and sync >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller < >> markrmil...@gmail.com >> >>>> wrote: >> >>>>>>> >> >>>>>>>> I don't think the versions you are thinking of apply here. >> Peersync >> >>>>>>>> does not look at that - it looks at version numbers for updates >> in >> >>> the >> >>>>>>>> transaction log - it compares the last 100 of them on leader and >> >>> replica. >> >>>>>>>> What it's saying is that the replica seems to have versions that >> >>> the leader >> >>>>>>>> does not. Have you scanned the logs for any interesting >> exceptions? >> >>>>>>>> >> >>>>>>>> Did the leader change during the heavy indexing? Did any zk >> session >> >>>>>>>> timeouts occur? >> >>>>>>>> >> >>>>>>>> - Mark >> >>>>>>>> >> >>>>>>>> On Apr 2, 2013, at 4:52 PM, Jamie Johnson <jej2...@gmail.com> >> >>> wrote: >> >>>>>>>> >> >>>>>>>>> I am currently looking at moving our Solr cluster to 4.2 and >> >>> noticed a >> >>>>>>>>> strange issue while testing today. Specifically the replica >> has a >> >>>>>>>> higher >> >>>>>>>>> version than the master which is causing the index to not >> >>> replicate. >> >>>>>>>>> Because of this the replica has fewer documents than the master. >> >>> What >> >>>>>>>>> could cause this and how can I resolve it short of taking down >> the >> >>>>>>>> index >> >>>>>>>>> and scping the right version in? >> >>>>>>>>> >> >>>>>>>>> MASTER: >> >>>>>>>>> Last Modified:about an hour ago >> >>>>>>>>> Num Docs:164880 >> >>>>>>>>> Max Doc:164880 >> >>>>>>>>> Deleted Docs:0 >> >>>>>>>>> Version:2387 >> >>>>>>>>> Segment Count:23 >> >>>>>>>>> >> >>>>>>>>> REPLICA: >> >>>>>>>>> Last Modified: about an hour ago >> >>>>>>>>> Num Docs:164773 >> >>>>>>>>> Max Doc:164773 >> >>>>>>>>> Deleted Docs:0 >> >>>>>>>>> Version:3001 >> >>>>>>>>> Segment Count:30 >> >>>>>>>>> >> >>>>>>>>> in the replicas log it says this: >> >>>>>>>>> >> >>>>>>>>> INFO: Creating new http client, >> >>>>>>>>> >> >>>>>>>> >> >>> >> config:maxConnectionsPerHost=20&maxConnections=10000&connTimeout=30000&socketTimeout=30000&retry=false >> >>>>>>>>> >> >>>>>>>>> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync >> >>>>>>>>> >> >>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 >> >>>>>>>>> url=http://10.38.33.17:7577/solrSTART replicas=[ >> >>>>>>>>> http://10.38.33.16:7575/solr/dsc-shard5-core1/] nUpdates=100 >> >>>>>>>>> >> >>>>>>>>> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync >> >>> handleVersions >> >>>>>>>>> >> >>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url= >> >>>>>>>> http://10.38.33.17:7577/solr >> >>>>>>>>> Received 100 versions from >> 10.38.33.16:7575/solr/dsc-shard5-core1/ >> >>>>>>>>> >> >>>>>>>>> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync >> >>> handleVersions >> >>>>>>>>> >> >>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url= >> >>>>>>>> http://10.38.33.17:7577/solr Our >> >>>>>>>>> versions are newer. ourLowThreshold=1431233788792274944 >> >>>>>>>>> otherHigh=1431233789440294912 >> >>>>>>>>> >> >>>>>>>>> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync >> >>>>>>>>> >> >>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 >> >>>>>>>>> url=http://10.38.33.17:7577/solrDONE. sync succeeded >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> which again seems to point that it thinks it has a newer >> version of >> >>>>>>>> the >> >>>>>>>>> index so it aborts. This happened while having 10 threads >> indexing >> >>>>>>>> 10,000 >> >>>>>>>>> items writing to a 6 shard (1 replica each) cluster. Any >> thoughts >> >>> on >> >>>>>>>> this >> >>>>>>>>> or what I should look for would be appreciated. >> >>>>>>>> >> >>>>>>>> >> >>>>>>> >> >>>>>> >> >>>>> >> >>> >> >>> >> >> >