with these changes things are looking good, I'm up to 600,000 documents without any issues as of right now. I'll keep going and add more to see if I find anything.
On Wed, Apr 3, 2013 at 4:01 PM, Jamie Johnson <jej2...@gmail.com> wrote: > ok, so that's not a deal breaker for me. I just changed it to match the > shards that are auto created and it looks like things are happy. I'll go > ahead and try my test to see if I can get things out of sync. > > > On Wed, Apr 3, 2013 at 3:56 PM, Mark Miller <markrmil...@gmail.com> wrote: > >> I had thought you could - but looking at the code recently, I don't think >> you can anymore. I think that's a technical limitation more than anything >> though. When these changes were made, I think support for that was simply >> not added at the time. >> >> I'm not sure exactly how straightforward it would be, but it seems doable >> - as it is, the overseer will preallocate shards when first creating the >> collection - that's when they get named shard(n). There would have to be >> logic to replace shard(n) with the custom shard name when the core actually >> registers. >> >> - Mark >> >> On Apr 3, 2013, at 3:42 PM, Jamie Johnson <jej2...@gmail.com> wrote: >> >> > answered my own question, it now says compositeId. What is problematic >> > though is that in addition to my shards (which are say jamie-shard1) I >> see >> > the solr created shards (shard1). I assume that these were created >> because >> > of the numShards param. Is there no way to specify the names of these >> > shards? >> > >> > >> > On Wed, Apr 3, 2013 at 3:25 PM, Jamie Johnson <jej2...@gmail.com> >> wrote: >> > >> >> ah interesting....so I need to specify num shards, blow out zk and then >> >> try this again to see if things work properly now. What is really >> strange >> >> is that for the most part things have worked right and on 4.2.1 I have >> >> 600,000 items indexed with no duplicates. In any event I will specify >> num >> >> shards clear out zk and begin again. If this works properly what >> should >> >> the router type be? >> >> >> >> >> >> On Wed, Apr 3, 2013 at 3:14 PM, Mark Miller <markrmil...@gmail.com> >> wrote: >> >> >> >>> If you don't specify numShards after 4.1, you get an implicit doc >> router >> >>> and it's up to you to distribute updates. In the past, partitioning >> was >> >>> done on the fly - but for shard splitting and perhaps other features, >> we >> >>> now divvy up the hash range up front based on numShards and store it >> in >> >>> ZooKeeper. No numShards is now how you take complete control of >> updates >> >>> yourself. >> >>> >> >>> - Mark >> >>> >> >>> On Apr 3, 2013, at 2:57 PM, Jamie Johnson <jej2...@gmail.com> wrote: >> >>> >> >>>> The router says "implicit". I did start from a blank zk state but >> >>> perhaps >> >>>> I missed one of the ZkCLI commands? One of my shards from the >> >>>> clusterstate.json is shown below. What is the process that should be >> >>> done >> >>>> to bootstrap a cluster other than the ZkCLI commands I listed above? >> My >> >>>> process right now is run those ZkCLI commands and then start solr on >> >>> all of >> >>>> the instances with a command like this >> >>>> >> >>>> java -server -Dshard=shard5 -DcoreName=shard5-core1 >> >>>> -Dsolr.data.dir=/solr/data/shard5-core1 >> >>> -Dcollection.configName=solr-conf >> >>>> -Dcollection=collection1 >> -DzkHost=so-zoo1:2181,so-zoo2:2181,so-zoo3:2181 >> >>>> -Djetty.port=7575 -DhostPort=7575 -jar start.jar >> >>>> >> >>>> I feel like maybe I'm missing a step. >> >>>> >> >>>> "shard5":{ >> >>>> "state":"active", >> >>>> "replicas":{ >> >>>> "10.38.33.16:7575_solr_shard5-core1":{ >> >>>> "shard":"shard5", >> >>>> "state":"active", >> >>>> "core":"shard5-core1", >> >>>> "collection":"collection1", >> >>>> "node_name":"10.38.33.16:7575_solr", >> >>>> "base_url":"http://10.38.33.16:7575/solr", >> >>>> "leader":"true"}, >> >>>> "10.38.33.17:7577_solr_shard5-core2":{ >> >>>> "shard":"shard5", >> >>>> "state":"recovering", >> >>>> "core":"shard5-core2", >> >>>> "collection":"collection1", >> >>>> "node_name":"10.38.33.17:7577_solr", >> >>>> "base_url":"http://10.38.33.17:7577/solr"}}} >> >>>> >> >>>> >> >>>> On Wed, Apr 3, 2013 at 2:40 PM, Mark Miller <markrmil...@gmail.com> >> >>> wrote: >> >>>> >> >>>>> It should be part of your clusterstate.json. Some users have >> reported >> >>>>> trouble upgrading a previous zk install when this change came. I >> >>>>> recommended manually updating the clusterstate.json to have the >> right >> >>> info, >> >>>>> and that seemed to work. Otherwise, I guess you have to start from a >> >>> clean >> >>>>> zk state. >> >>>>> >> >>>>> If you don't have that range information, I think there will be >> >>> trouble. >> >>>>> Do you have an router type defined in the clusterstate.json? >> >>>>> >> >>>>> - Mark >> >>>>> >> >>>>> On Apr 3, 2013, at 2:24 PM, Jamie Johnson <jej2...@gmail.com> >> wrote: >> >>>>> >> >>>>>> Where is this information stored in ZK? I don't see it in the >> cluster >> >>>>>> state (or perhaps I don't understand it ;) ). >> >>>>>> >> >>>>>> Perhaps something with my process is broken. What I do when I >> start >> >>> from >> >>>>>> scratch is the following >> >>>>>> >> >>>>>> ZkCLI -cmd upconfig ... >> >>>>>> ZkCLI -cmd linkconfig .... >> >>>>>> >> >>>>>> but I don't ever explicitly create the collection. What should the >> >>> steps >> >>>>>> from scratch be? I am moving from an unreleased snapshot of 4.0 >> so I >> >>>>> never >> >>>>>> did that previously either so perhaps I did create the collection >> in >> >>> one >> >>>>> of >> >>>>>> my steps to get this working but have forgotten it along the way. >> >>>>>> >> >>>>>> >> >>>>>> On Wed, Apr 3, 2013 at 2:16 PM, Mark Miller <markrmil...@gmail.com >> > >> >>>>> wrote: >> >>>>>> >> >>>>>>> Thanks for digging Jamie. In 4.2, hash ranges are assigned up >> front >> >>>>> when a >> >>>>>>> collection is created - each shard gets a range, which is stored >> in >> >>>>>>> zookeeper. You should not be able to end up with the same id on >> >>>>> different >> >>>>>>> shards - something very odd going on. >> >>>>>>> >> >>>>>>> Hopefully I'll have some time to try and help you reproduce. >> Ideally >> >>> we >> >>>>>>> can capture it in a test case. >> >>>>>>> >> >>>>>>> - Mark >> >>>>>>> >> >>>>>>> On Apr 3, 2013, at 1:13 PM, Jamie Johnson <jej2...@gmail.com> >> wrote: >> >>>>>>> >> >>>>>>>> no, my thought was wrong, it appears that even with the parameter >> >>> set I >> >>>>>>> am >> >>>>>>>> seeing this behavior. I've been able to duplicate it on 4.2.0 by >> >>>>>>> indexing >> >>>>>>>> 100,000 documents on 10 threads (10,000 each) when I get to >> 400,000 >> >>> or >> >>>>>>> so. >> >>>>>>>> I will try this on 4.2.1. to see if I see the same behavior >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> On Wed, Apr 3, 2013 at 12:37 PM, Jamie Johnson < >> jej2...@gmail.com> >> >>>>>>> wrote: >> >>>>>>>> >> >>>>>>>>> Since I don't have that many items in my index I exported all of >> >>> the >> >>>>>>> keys >> >>>>>>>>> for each shard and wrote a simple java program that checks for >> >>>>>>> duplicates. >> >>>>>>>>> I found some duplicate keys on different shards, a grep of the >> >>> files >> >>>>> for >> >>>>>>>>> the keys found does indicate that they made it to the wrong >> places. >> >>>>> If >> >>>>>>> you >> >>>>>>>>> notice documents with the same ID are on shard 3 and shard 5. >> Is >> >>> it >> >>>>>>>>> possible that the hash is being calculated taking into account >> only >> >>>>> the >> >>>>>>>>> "live" nodes? I know that we don't specify the numShards param >> @ >> >>>>>>> startup >> >>>>>>>>> so could this be what is happening? >> >>>>>>>>> >> >>>>>>>>> grep -c "7cd1a717-3d94-4f5d-bcb1-9d8a95ca78de" * >> >>>>>>>>> shard1-core1:0 >> >>>>>>>>> shard1-core2:0 >> >>>>>>>>> shard2-core1:0 >> >>>>>>>>> shard2-core2:0 >> >>>>>>>>> shard3-core1:1 >> >>>>>>>>> shard3-core2:1 >> >>>>>>>>> shard4-core1:0 >> >>>>>>>>> shard4-core2:0 >> >>>>>>>>> shard5-core1:1 >> >>>>>>>>> shard5-core2:1 >> >>>>>>>>> shard6-core1:0 >> >>>>>>>>> shard6-core2:0 >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> On Wed, Apr 3, 2013 at 10:42 AM, Jamie Johnson < >> jej2...@gmail.com> >> >>>>>>> wrote: >> >>>>>>>>> >> >>>>>>>>>> Something interesting that I'm noticing as well, I just indexed >> >>>>> 300,000 >> >>>>>>>>>> items, and some how 300,020 ended up in the index. I thought >> >>>>> perhaps I >> >>>>>>>>>> messed something up so I started the indexing again and indexed >> >>>>> another >> >>>>>>>>>> 400,000 and I see 400,064 docs. Is there a good way to find >> >>>>> possibile >> >>>>>>>>>> duplicates? I had tried to facet on key (our id field) but >> that >> >>>>> didn't >> >>>>>>>>>> give me anything with more than a count of 1. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> On Wed, Apr 3, 2013 at 9:22 AM, Jamie Johnson < >> jej2...@gmail.com> >> >>>>>>> wrote: >> >>>>>>>>>> >> >>>>>>>>>>> Ok, so clearing the transaction log allowed things to go >> again. >> >>> I >> >>>>> am >> >>>>>>>>>>> going to clear the index and try to replicate the problem on >> >>> 4.2.0 >> >>>>>>> and then >> >>>>>>>>>>> I'll try on 4.2.1 >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> On Wed, Apr 3, 2013 at 8:21 AM, Mark Miller < >> >>> markrmil...@gmail.com >> >>>>>>>> wrote: >> >>>>>>>>>>> >> >>>>>>>>>>>> No, not that I know if, which is why I say we need to get to >> the >> >>>>>>> bottom >> >>>>>>>>>>>> of it. >> >>>>>>>>>>>> >> >>>>>>>>>>>> - Mark >> >>>>>>>>>>>> >> >>>>>>>>>>>> On Apr 2, 2013, at 10:18 PM, Jamie Johnson < >> jej2...@gmail.com> >> >>>>>>> wrote: >> >>>>>>>>>>>> >> >>>>>>>>>>>>> Mark >> >>>>>>>>>>>>> It's there a particular jira issue that you think may >> address >> >>>>> this? >> >>>>>>> I >> >>>>>>>>>>>> read >> >>>>>>>>>>>>> through it quickly but didn't see one that jumped out >> >>>>>>>>>>>>> On Apr 2, 2013 10:07 PM, "Jamie Johnson" <jej2...@gmail.com >> > >> >>>>> wrote: >> >>>>>>>>>>>>> >> >>>>>>>>>>>>>> I brought the bad one down and back up and it did nothing. >> I >> >>> can >> >>>>>>>>>>>> clear >> >>>>>>>>>>>>>> the index and try4.2.1. I will save off the logs and see if >> >>> there >> >>>>>>> is >> >>>>>>>>>>>>>> anything else odd >> >>>>>>>>>>>>>> On Apr 2, 2013 9:13 PM, "Mark Miller" < >> markrmil...@gmail.com> >> >>>>>>> wrote: >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> It would appear it's a bug given what you have said. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> Any other exceptions would be useful. Might be best to >> start >> >>>>>>>>>>>> tracking in >> >>>>>>>>>>>>>>> a JIRA issue as well. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> To fix, I'd bring the behind node down and back again. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> Unfortunately, I'm pressed for time, but we really need to >> >>> get >> >>>>> to >> >>>>>>>>>>>> the >> >>>>>>>>>>>>>>> bottom of this and fix it, or determine if it's fixed in >> >>> 4.2.1 >> >>>>>>>>>>>> (spreading >> >>>>>>>>>>>>>>> to mirrors now). >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> - Mark >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> On Apr 2, 2013, at 7:21 PM, Jamie Johnson < >> jej2...@gmail.com >> >>>> >> >>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> Sorry I didn't ask the obvious question. Is there >> anything >> >>>>> else >> >>>>>>>>>>>> that I >> >>>>>>>>>>>>>>>> should be looking for here and is this a bug? I'd be >> happy >> >>> to >> >>>>>>>>>>>> troll >> >>>>>>>>>>>>>>>> through the logs further if more information is needed, >> just >> >>>>> let >> >>>>>>> me >> >>>>>>>>>>>>>>> know. >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> Also what is the most appropriate mechanism to fix this. >> >>> Is it >> >>>>>>>>>>>>>>> required to >> >>>>>>>>>>>>>>>> kill the index that is out of sync and let solr resync >> >>> things? >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson < >> >>>>> jej2...@gmail.com >> >>>>>>>> >> >>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> sorry for spamming here.... >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> shard5-core2 is the instance we're having issues with... >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM >> org.apache.solr.common.SolrException >> >>>>> log >> >>>>>>>>>>>>>>>>> SEVERE: shard update error StdNode: >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>> >> >>>>> >> >>> >> http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException >> >>>>>>>>>>>>>>> : >> >>>>>>>>>>>>>>>>> Server at >> >>>>> http://10.38.33.17:7577/solr/dsc-shard5-core2returned >> >>>>>>>>>>>> non >> >>>>>>>>>>>>>>> ok >> >>>>>>>>>>>>>>>>> status:503, message:Service Unavailable >> >>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>> >> >>>>> >> >>> >> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373) >> >>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>> >> >>>>> >> >>> >> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) >> >>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>> >> >>>>> >> >>> >> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332) >> >>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>> >> >>>>> >> >>> >> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306) >> >>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>> >> >>>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> >>>>>>>>>>>>>>>>> at >> >>>>> java.util.concurrent.FutureTask.run(FutureTask.java:138) >> >>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>> >> >>> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) >> >>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>> >> >>>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> >>>>>>>>>>>>>>>>> at >> >>>>> java.util.concurrent.FutureTask.run(FutureTask.java:138) >> >>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>> >> >>>>> >> >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> >>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>> >> >>>>> >> >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> >>>>>>>>>>>>>>>>> at java.lang.Thread.run(Thread.java:662) >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson < >> >>>>>>> jej2...@gmail.com> >> >>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> here is another one that looks interesting >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM >> >>> org.apache.solr.common.SolrException >> >>>>> log >> >>>>>>>>>>>>>>>>>> SEVERE: org.apache.solr.common.SolrException: >> ClusterState >> >>>>> says >> >>>>>>>>>>>> we are >> >>>>>>>>>>>>>>>>>> the leader, but locally we don't think so >> >>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>> >> >>>>> >> >>> >> org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293) >> >>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>> >> >>>>> >> >>> >> org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228) >> >>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>> >> >>>>> >> >>> >> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339) >> >>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>> >> >>>>> >> >>> >> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) >> >>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>> >> >>>>> >> >>> >> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246) >> >>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>> >> >>>>>>> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) >> >>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>> >> >>>>> >> >>> >> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) >> >>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>> >> >>>>> >> >>> >> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) >> >>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>> >> >>>>> >> >>> >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) >> >>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>> org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) >> >>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>> >> >>>>> >> >>> >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) >> >>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>> >> >>>>> >> >>> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson < >> >>>>>>> jej2...@gmail.com >> >>>>>>>>>>>>> >> >>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> Looking at the master it looks like at some point >> there >> >>> were >> >>>>>>>>>>>> shards >> >>>>>>>>>>>>>>> that >> >>>>>>>>>>>>>>>>>>> went down. I am seeing things like what is below. >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> NFO: A cluster state change: WatchedEvent >> >>>>> state:SyncConnected >> >>>>>>>>>>>>>>>>>>> type:NodeChildrenChanged path:/live_nodes, has >> occurred - >> >>>>>>>>>>>>>>> updating... (live >> >>>>>>>>>>>>>>>>>>> nodes size: 12) >> >>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM >> >>>>>>>>>>>> org.apache.solr.common.cloud.ZkStateReader$3 >> >>>>>>>>>>>>>>>>>>> process >> >>>>>>>>>>>>>>>>>>> INFO: Updating live nodes... (9) >> >>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM >> >>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext >> >>>>>>>>>>>>>>>>>>> runLeaderProcess >> >>>>>>>>>>>>>>>>>>> INFO: Running the leader process. >> >>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM >> >>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext >> >>>>>>>>>>>>>>>>>>> shouldIBeLeader >> >>>>>>>>>>>>>>>>>>> INFO: Checking if I should try and be the leader. >> >>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM >> >>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext >> >>>>>>>>>>>>>>>>>>> shouldIBeLeader >> >>>>>>>>>>>>>>>>>>> INFO: My last published State was Active, it's okay >> to be >> >>>>> the >> >>>>>>>>>>>> leader. >> >>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM >> >>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext >> >>>>>>>>>>>>>>>>>>> runLeaderProcess >> >>>>>>>>>>>>>>>>>>> INFO: I may be the new leader - try and sync >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller < >> >>>>>>>>>>>> markrmil...@gmail.com >> >>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> I don't think the versions you are thinking of apply >> >>> here. >> >>>>>>>>>>>> Peersync >> >>>>>>>>>>>>>>>>>>>> does not look at that - it looks at version numbers >> for >> >>>>>>>>>>>> updates in >> >>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>> transaction log - it compares the last 100 of them on >> >>>>> leader >> >>>>>>>>>>>> and >> >>>>>>>>>>>>>>> replica. >> >>>>>>>>>>>>>>>>>>>> What it's saying is that the replica seems to have >> >>> versions >> >>>>>>>>>>>> that >> >>>>>>>>>>>>>>> the leader >> >>>>>>>>>>>>>>>>>>>> does not. Have you scanned the logs for any >> interesting >> >>>>>>>>>>>> exceptions? >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> Did the leader change during the heavy indexing? Did >> >>> any zk >> >>>>>>>>>>>> session >> >>>>>>>>>>>>>>>>>>>> timeouts occur? >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> - Mark >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 4:52 PM, Jamie Johnson < >> >>>>> jej2...@gmail.com >> >>>>>>>> >> >>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> I am currently looking at moving our Solr cluster to >> >>> 4.2 >> >>>>> and >> >>>>>>>>>>>>>>> noticed a >> >>>>>>>>>>>>>>>>>>>>> strange issue while testing today. Specifically the >> >>>>> replica >> >>>>>>>>>>>> has a >> >>>>>>>>>>>>>>>>>>>> higher >> >>>>>>>>>>>>>>>>>>>>> version than the master which is causing the index >> to >> >>> not >> >>>>>>>>>>>>>>> replicate. >> >>>>>>>>>>>>>>>>>>>>> Because of this the replica has fewer documents than >> >>> the >> >>>>>>>>>>>> master. >> >>>>>>>>>>>>>>> What >> >>>>>>>>>>>>>>>>>>>>> could cause this and how can I resolve it short of >> >>> taking >> >>>>>>>>>>>> down the >> >>>>>>>>>>>>>>>>>>>> index >> >>>>>>>>>>>>>>>>>>>>> and scping the right version in? >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> MASTER: >> >>>>>>>>>>>>>>>>>>>>> Last Modified:about an hour ago >> >>>>>>>>>>>>>>>>>>>>> Num Docs:164880 >> >>>>>>>>>>>>>>>>>>>>> Max Doc:164880 >> >>>>>>>>>>>>>>>>>>>>> Deleted Docs:0 >> >>>>>>>>>>>>>>>>>>>>> Version:2387 >> >>>>>>>>>>>>>>>>>>>>> Segment Count:23 >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> REPLICA: >> >>>>>>>>>>>>>>>>>>>>> Last Modified: about an hour ago >> >>>>>>>>>>>>>>>>>>>>> Num Docs:164773 >> >>>>>>>>>>>>>>>>>>>>> Max Doc:164773 >> >>>>>>>>>>>>>>>>>>>>> Deleted Docs:0 >> >>>>>>>>>>>>>>>>>>>>> Version:3001 >> >>>>>>>>>>>>>>>>>>>>> Segment Count:30 >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> in the replicas log it says this: >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> INFO: Creating new http client, >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>> >> >>>>> >> >>> >> config:maxConnectionsPerHost=20&maxConnections=10000&connTimeout=30000&socketTimeout=30000&retry=false >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM >> org.apache.solr.update.PeerSync >> >>>>> sync >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 >> >>>>>>>>>>>>>>>>>>>>> url=http://10.38.33.17:7577/solrSTART replicas=[ >> >>>>>>>>>>>>>>>>>>>>> http://10.38.33.16:7575/solr/dsc-shard5-core1/] >> >>>>>>> nUpdates=100 >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM >> org.apache.solr.update.PeerSync >> >>>>>>>>>>>>>>> handleVersions >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url= >> >>>>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr >> >>>>>>>>>>>>>>>>>>>>> Received 100 versions from >> >>>>>>>>>>>> 10.38.33.16:7575/solr/dsc-shard5-core1/ >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM >> org.apache.solr.update.PeerSync >> >>>>>>>>>>>>>>> handleVersions >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url= >> >>>>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr Our >> >>>>>>>>>>>>>>>>>>>>> versions are newer. >> ourLowThreshold=1431233788792274944 >> >>>>>>>>>>>>>>>>>>>>> otherHigh=1431233789440294912 >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM >> org.apache.solr.update.PeerSync >> >>>>> sync >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 >> >>>>>>>>>>>>>>>>>>>>> url=http://10.38.33.17:7577/solrDONE. sync >> succeeded >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> which again seems to point that it thinks it has a >> >>> newer >> >>>>>>>>>>>> version of >> >>>>>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>> index so it aborts. This happened while having 10 >> >>> threads >> >>>>>>>>>>>> indexing >> >>>>>>>>>>>>>>>>>>>> 10,000 >> >>>>>>>>>>>>>>>>>>>>> items writing to a 6 shard (1 replica each) cluster. >> >>> Any >> >>>>>>>>>>>> thoughts >> >>>>>>>>>>>>>>> on >> >>>>>>>>>>>>>>>>>>>> this >> >>>>>>>>>>>>>>>>>>>>> or what I should look for would be appreciated. >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>> >> >>>>> >> >>> >> >>> >> >> >> >> >