ok, so that's not a deal breaker for me. I just changed it to match the shards that are auto created and it looks like things are happy. I'll go ahead and try my test to see if I can get things out of sync.
On Wed, Apr 3, 2013 at 3:56 PM, Mark Miller <markrmil...@gmail.com> wrote: > I had thought you could - but looking at the code recently, I don't think > you can anymore. I think that's a technical limitation more than anything > though. When these changes were made, I think support for that was simply > not added at the time. > > I'm not sure exactly how straightforward it would be, but it seems doable > - as it is, the overseer will preallocate shards when first creating the > collection - that's when they get named shard(n). There would have to be > logic to replace shard(n) with the custom shard name when the core actually > registers. > > - Mark > > On Apr 3, 2013, at 3:42 PM, Jamie Johnson <jej2...@gmail.com> wrote: > > > answered my own question, it now says compositeId. What is problematic > > though is that in addition to my shards (which are say jamie-shard1) I > see > > the solr created shards (shard1). I assume that these were created > because > > of the numShards param. Is there no way to specify the names of these > > shards? > > > > > > On Wed, Apr 3, 2013 at 3:25 PM, Jamie Johnson <jej2...@gmail.com> wrote: > > > >> ah interesting....so I need to specify num shards, blow out zk and then > >> try this again to see if things work properly now. What is really > strange > >> is that for the most part things have worked right and on 4.2.1 I have > >> 600,000 items indexed with no duplicates. In any event I will specify > num > >> shards clear out zk and begin again. If this works properly what should > >> the router type be? > >> > >> > >> On Wed, Apr 3, 2013 at 3:14 PM, Mark Miller <markrmil...@gmail.com> > wrote: > >> > >>> If you don't specify numShards after 4.1, you get an implicit doc > router > >>> and it's up to you to distribute updates. In the past, partitioning was > >>> done on the fly - but for shard splitting and perhaps other features, > we > >>> now divvy up the hash range up front based on numShards and store it in > >>> ZooKeeper. No numShards is now how you take complete control of updates > >>> yourself. > >>> > >>> - Mark > >>> > >>> On Apr 3, 2013, at 2:57 PM, Jamie Johnson <jej2...@gmail.com> wrote: > >>> > >>>> The router says "implicit". I did start from a blank zk state but > >>> perhaps > >>>> I missed one of the ZkCLI commands? One of my shards from the > >>>> clusterstate.json is shown below. What is the process that should be > >>> done > >>>> to bootstrap a cluster other than the ZkCLI commands I listed above? > My > >>>> process right now is run those ZkCLI commands and then start solr on > >>> all of > >>>> the instances with a command like this > >>>> > >>>> java -server -Dshard=shard5 -DcoreName=shard5-core1 > >>>> -Dsolr.data.dir=/solr/data/shard5-core1 > >>> -Dcollection.configName=solr-conf > >>>> -Dcollection=collection1 > -DzkHost=so-zoo1:2181,so-zoo2:2181,so-zoo3:2181 > >>>> -Djetty.port=7575 -DhostPort=7575 -jar start.jar > >>>> > >>>> I feel like maybe I'm missing a step. > >>>> > >>>> "shard5":{ > >>>> "state":"active", > >>>> "replicas":{ > >>>> "10.38.33.16:7575_solr_shard5-core1":{ > >>>> "shard":"shard5", > >>>> "state":"active", > >>>> "core":"shard5-core1", > >>>> "collection":"collection1", > >>>> "node_name":"10.38.33.16:7575_solr", > >>>> "base_url":"http://10.38.33.16:7575/solr", > >>>> "leader":"true"}, > >>>> "10.38.33.17:7577_solr_shard5-core2":{ > >>>> "shard":"shard5", > >>>> "state":"recovering", > >>>> "core":"shard5-core2", > >>>> "collection":"collection1", > >>>> "node_name":"10.38.33.17:7577_solr", > >>>> "base_url":"http://10.38.33.17:7577/solr"}}} > >>>> > >>>> > >>>> On Wed, Apr 3, 2013 at 2:40 PM, Mark Miller <markrmil...@gmail.com> > >>> wrote: > >>>> > >>>>> It should be part of your clusterstate.json. Some users have reported > >>>>> trouble upgrading a previous zk install when this change came. I > >>>>> recommended manually updating the clusterstate.json to have the right > >>> info, > >>>>> and that seemed to work. Otherwise, I guess you have to start from a > >>> clean > >>>>> zk state. > >>>>> > >>>>> If you don't have that range information, I think there will be > >>> trouble. > >>>>> Do you have an router type defined in the clusterstate.json? > >>>>> > >>>>> - Mark > >>>>> > >>>>> On Apr 3, 2013, at 2:24 PM, Jamie Johnson <jej2...@gmail.com> wrote: > >>>>> > >>>>>> Where is this information stored in ZK? I don't see it in the > cluster > >>>>>> state (or perhaps I don't understand it ;) ). > >>>>>> > >>>>>> Perhaps something with my process is broken. What I do when I start > >>> from > >>>>>> scratch is the following > >>>>>> > >>>>>> ZkCLI -cmd upconfig ... > >>>>>> ZkCLI -cmd linkconfig .... > >>>>>> > >>>>>> but I don't ever explicitly create the collection. What should the > >>> steps > >>>>>> from scratch be? I am moving from an unreleased snapshot of 4.0 so > I > >>>>> never > >>>>>> did that previously either so perhaps I did create the collection in > >>> one > >>>>> of > >>>>>> my steps to get this working but have forgotten it along the way. > >>>>>> > >>>>>> > >>>>>> On Wed, Apr 3, 2013 at 2:16 PM, Mark Miller <markrmil...@gmail.com> > >>>>> wrote: > >>>>>> > >>>>>>> Thanks for digging Jamie. In 4.2, hash ranges are assigned up front > >>>>> when a > >>>>>>> collection is created - each shard gets a range, which is stored in > >>>>>>> zookeeper. You should not be able to end up with the same id on > >>>>> different > >>>>>>> shards - something very odd going on. > >>>>>>> > >>>>>>> Hopefully I'll have some time to try and help you reproduce. > Ideally > >>> we > >>>>>>> can capture it in a test case. > >>>>>>> > >>>>>>> - Mark > >>>>>>> > >>>>>>> On Apr 3, 2013, at 1:13 PM, Jamie Johnson <jej2...@gmail.com> > wrote: > >>>>>>> > >>>>>>>> no, my thought was wrong, it appears that even with the parameter > >>> set I > >>>>>>> am > >>>>>>>> seeing this behavior. I've been able to duplicate it on 4.2.0 by > >>>>>>> indexing > >>>>>>>> 100,000 documents on 10 threads (10,000 each) when I get to > 400,000 > >>> or > >>>>>>> so. > >>>>>>>> I will try this on 4.2.1. to see if I see the same behavior > >>>>>>>> > >>>>>>>> > >>>>>>>> On Wed, Apr 3, 2013 at 12:37 PM, Jamie Johnson <jej2...@gmail.com > > > >>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Since I don't have that many items in my index I exported all of > >>> the > >>>>>>> keys > >>>>>>>>> for each shard and wrote a simple java program that checks for > >>>>>>> duplicates. > >>>>>>>>> I found some duplicate keys on different shards, a grep of the > >>> files > >>>>> for > >>>>>>>>> the keys found does indicate that they made it to the wrong > places. > >>>>> If > >>>>>>> you > >>>>>>>>> notice documents with the same ID are on shard 3 and shard 5. Is > >>> it > >>>>>>>>> possible that the hash is being calculated taking into account > only > >>>>> the > >>>>>>>>> "live" nodes? I know that we don't specify the numShards param @ > >>>>>>> startup > >>>>>>>>> so could this be what is happening? > >>>>>>>>> > >>>>>>>>> grep -c "7cd1a717-3d94-4f5d-bcb1-9d8a95ca78de" * > >>>>>>>>> shard1-core1:0 > >>>>>>>>> shard1-core2:0 > >>>>>>>>> shard2-core1:0 > >>>>>>>>> shard2-core2:0 > >>>>>>>>> shard3-core1:1 > >>>>>>>>> shard3-core2:1 > >>>>>>>>> shard4-core1:0 > >>>>>>>>> shard4-core2:0 > >>>>>>>>> shard5-core1:1 > >>>>>>>>> shard5-core2:1 > >>>>>>>>> shard6-core1:0 > >>>>>>>>> shard6-core2:0 > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Wed, Apr 3, 2013 at 10:42 AM, Jamie Johnson < > jej2...@gmail.com> > >>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Something interesting that I'm noticing as well, I just indexed > >>>>> 300,000 > >>>>>>>>>> items, and some how 300,020 ended up in the index. I thought > >>>>> perhaps I > >>>>>>>>>> messed something up so I started the indexing again and indexed > >>>>> another > >>>>>>>>>> 400,000 and I see 400,064 docs. Is there a good way to find > >>>>> possibile > >>>>>>>>>> duplicates? I had tried to facet on key (our id field) but that > >>>>> didn't > >>>>>>>>>> give me anything with more than a count of 1. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Wed, Apr 3, 2013 at 9:22 AM, Jamie Johnson < > jej2...@gmail.com> > >>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Ok, so clearing the transaction log allowed things to go again. > >>> I > >>>>> am > >>>>>>>>>>> going to clear the index and try to replicate the problem on > >>> 4.2.0 > >>>>>>> and then > >>>>>>>>>>> I'll try on 4.2.1 > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Wed, Apr 3, 2013 at 8:21 AM, Mark Miller < > >>> markrmil...@gmail.com > >>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> No, not that I know if, which is why I say we need to get to > the > >>>>>>> bottom > >>>>>>>>>>>> of it. > >>>>>>>>>>>> > >>>>>>>>>>>> - Mark > >>>>>>>>>>>> > >>>>>>>>>>>> On Apr 2, 2013, at 10:18 PM, Jamie Johnson <jej2...@gmail.com > > > >>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> Mark > >>>>>>>>>>>>> It's there a particular jira issue that you think may address > >>>>> this? > >>>>>>> I > >>>>>>>>>>>> read > >>>>>>>>>>>>> through it quickly but didn't see one that jumped out > >>>>>>>>>>>>> On Apr 2, 2013 10:07 PM, "Jamie Johnson" <jej2...@gmail.com> > >>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> I brought the bad one down and back up and it did nothing. > I > >>> can > >>>>>>>>>>>> clear > >>>>>>>>>>>>>> the index and try4.2.1. I will save off the logs and see if > >>> there > >>>>>>> is > >>>>>>>>>>>>>> anything else odd > >>>>>>>>>>>>>> On Apr 2, 2013 9:13 PM, "Mark Miller" < > markrmil...@gmail.com> > >>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> It would appear it's a bug given what you have said. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Any other exceptions would be useful. Might be best to > start > >>>>>>>>>>>> tracking in > >>>>>>>>>>>>>>> a JIRA issue as well. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> To fix, I'd bring the behind node down and back again. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Unfortunately, I'm pressed for time, but we really need to > >>> get > >>>>> to > >>>>>>>>>>>> the > >>>>>>>>>>>>>>> bottom of this and fix it, or determine if it's fixed in > >>> 4.2.1 > >>>>>>>>>>>> (spreading > >>>>>>>>>>>>>>> to mirrors now). > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> - Mark > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Apr 2, 2013, at 7:21 PM, Jamie Johnson < > jej2...@gmail.com > >>>> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Sorry I didn't ask the obvious question. Is there > anything > >>>>> else > >>>>>>>>>>>> that I > >>>>>>>>>>>>>>>> should be looking for here and is this a bug? I'd be > happy > >>> to > >>>>>>>>>>>> troll > >>>>>>>>>>>>>>>> through the logs further if more information is needed, > just > >>>>> let > >>>>>>> me > >>>>>>>>>>>>>>> know. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Also what is the most appropriate mechanism to fix this. > >>> Is it > >>>>>>>>>>>>>>> required to > >>>>>>>>>>>>>>>> kill the index that is out of sync and let solr resync > >>> things? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson < > >>>>> jej2...@gmail.com > >>>>>>>> > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> sorry for spamming here.... > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> shard5-core2 is the instance we're having issues with... > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM > org.apache.solr.common.SolrException > >>>>> log > >>>>>>>>>>>>>>>>> SEVERE: shard update error StdNode: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>> > >>>>> > >>> > http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException > >>>>>>>>>>>>>>> : > >>>>>>>>>>>>>>>>> Server at > >>>>> http://10.38.33.17:7577/solr/dsc-shard5-core2returned > >>>>>>>>>>>> non > >>>>>>>>>>>>>>> ok > >>>>>>>>>>>>>>>>> status:503, message:Service Unavailable > >>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>> > >>>>> > >>> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373) > >>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>> > >>>>> > >>> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) > >>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>> > >>>>> > >>> > org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332) > >>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>> > >>>>> > >>> > org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306) > >>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>> > >>>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > >>>>>>>>>>>>>>>>> at > >>>>> java.util.concurrent.FutureTask.run(FutureTask.java:138) > >>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>> > >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) > >>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>> > >>>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > >>>>>>>>>>>>>>>>> at > >>>>> java.util.concurrent.FutureTask.run(FutureTask.java:138) > >>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>> > >>>>> > >>> > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > >>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>> > >>>>> > >>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > >>>>>>>>>>>>>>>>> at java.lang.Thread.run(Thread.java:662) > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson < > >>>>>>> jej2...@gmail.com> > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> here is another one that looks interesting > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM > >>> org.apache.solr.common.SolrException > >>>>> log > >>>>>>>>>>>>>>>>>> SEVERE: org.apache.solr.common.SolrException: > ClusterState > >>>>> says > >>>>>>>>>>>> we are > >>>>>>>>>>>>>>>>>> the leader, but locally we don't think so > >>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>> > >>>>> > >>> > org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293) > >>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>> > >>>>> > >>> > org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228) > >>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>> > >>>>> > >>> > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339) > >>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>> > >>>>> > >>> > org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) > >>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>> > >>>>> > >>> > org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246) > >>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>> > >>>>>>> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) > >>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>> > >>>>> > >>> > org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) > >>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>> > >>>>> > >>> > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > >>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>> > >>>>> > >>> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > >>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>> org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) > >>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>> > >>>>> > >>> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) > >>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>> > >>>>> > >>> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson < > >>>>>>> jej2...@gmail.com > >>>>>>>>>>>>> > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Looking at the master it looks like at some point there > >>> were > >>>>>>>>>>>> shards > >>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>>>> went down. I am seeing things like what is below. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> NFO: A cluster state change: WatchedEvent > >>>>> state:SyncConnected > >>>>>>>>>>>>>>>>>>> type:NodeChildrenChanged path:/live_nodes, has > occurred - > >>>>>>>>>>>>>>> updating... (live > >>>>>>>>>>>>>>>>>>> nodes size: 12) > >>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM > >>>>>>>>>>>> org.apache.solr.common.cloud.ZkStateReader$3 > >>>>>>>>>>>>>>>>>>> process > >>>>>>>>>>>>>>>>>>> INFO: Updating live nodes... (9) > >>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM > >>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext > >>>>>>>>>>>>>>>>>>> runLeaderProcess > >>>>>>>>>>>>>>>>>>> INFO: Running the leader process. > >>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM > >>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext > >>>>>>>>>>>>>>>>>>> shouldIBeLeader > >>>>>>>>>>>>>>>>>>> INFO: Checking if I should try and be the leader. > >>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM > >>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext > >>>>>>>>>>>>>>>>>>> shouldIBeLeader > >>>>>>>>>>>>>>>>>>> INFO: My last published State was Active, it's okay to > be > >>>>> the > >>>>>>>>>>>> leader. > >>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM > >>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext > >>>>>>>>>>>>>>>>>>> runLeaderProcess > >>>>>>>>>>>>>>>>>>> INFO: I may be the new leader - try and sync > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller < > >>>>>>>>>>>> markrmil...@gmail.com > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> I don't think the versions you are thinking of apply > >>> here. > >>>>>>>>>>>> Peersync > >>>>>>>>>>>>>>>>>>>> does not look at that - it looks at version numbers > for > >>>>>>>>>>>> updates in > >>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>> transaction log - it compares the last 100 of them on > >>>>> leader > >>>>>>>>>>>> and > >>>>>>>>>>>>>>> replica. > >>>>>>>>>>>>>>>>>>>> What it's saying is that the replica seems to have > >>> versions > >>>>>>>>>>>> that > >>>>>>>>>>>>>>> the leader > >>>>>>>>>>>>>>>>>>>> does not. Have you scanned the logs for any > interesting > >>>>>>>>>>>> exceptions? > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Did the leader change during the heavy indexing? Did > >>> any zk > >>>>>>>>>>>> session > >>>>>>>>>>>>>>>>>>>> timeouts occur? > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> - Mark > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 4:52 PM, Jamie Johnson < > >>>>> jej2...@gmail.com > >>>>>>>> > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> I am currently looking at moving our Solr cluster to > >>> 4.2 > >>>>> and > >>>>>>>>>>>>>>> noticed a > >>>>>>>>>>>>>>>>>>>>> strange issue while testing today. Specifically the > >>>>> replica > >>>>>>>>>>>> has a > >>>>>>>>>>>>>>>>>>>> higher > >>>>>>>>>>>>>>>>>>>>> version than the master which is causing the index to > >>> not > >>>>>>>>>>>>>>> replicate. > >>>>>>>>>>>>>>>>>>>>> Because of this the replica has fewer documents than > >>> the > >>>>>>>>>>>> master. > >>>>>>>>>>>>>>> What > >>>>>>>>>>>>>>>>>>>>> could cause this and how can I resolve it short of > >>> taking > >>>>>>>>>>>> down the > >>>>>>>>>>>>>>>>>>>> index > >>>>>>>>>>>>>>>>>>>>> and scping the right version in? > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> MASTER: > >>>>>>>>>>>>>>>>>>>>> Last Modified:about an hour ago > >>>>>>>>>>>>>>>>>>>>> Num Docs:164880 > >>>>>>>>>>>>>>>>>>>>> Max Doc:164880 > >>>>>>>>>>>>>>>>>>>>> Deleted Docs:0 > >>>>>>>>>>>>>>>>>>>>> Version:2387 > >>>>>>>>>>>>>>>>>>>>> Segment Count:23 > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> REPLICA: > >>>>>>>>>>>>>>>>>>>>> Last Modified: about an hour ago > >>>>>>>>>>>>>>>>>>>>> Num Docs:164773 > >>>>>>>>>>>>>>>>>>>>> Max Doc:164773 > >>>>>>>>>>>>>>>>>>>>> Deleted Docs:0 > >>>>>>>>>>>>>>>>>>>>> Version:3001 > >>>>>>>>>>>>>>>>>>>>> Segment Count:30 > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> in the replicas log it says this: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> INFO: Creating new http client, > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>> > >>>>> > >>> > config:maxConnectionsPerHost=20&maxConnections=10000&connTimeout=30000&socketTimeout=30000&retry=false > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM > org.apache.solr.update.PeerSync > >>>>> sync > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 > >>>>>>>>>>>>>>>>>>>>> url=http://10.38.33.17:7577/solrSTART replicas=[ > >>>>>>>>>>>>>>>>>>>>> http://10.38.33.16:7575/solr/dsc-shard5-core1/] > >>>>>>> nUpdates=100 > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM > org.apache.solr.update.PeerSync > >>>>>>>>>>>>>>> handleVersions > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url= > >>>>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr > >>>>>>>>>>>>>>>>>>>>> Received 100 versions from > >>>>>>>>>>>> 10.38.33.16:7575/solr/dsc-shard5-core1/ > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM > org.apache.solr.update.PeerSync > >>>>>>>>>>>>>>> handleVersions > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url= > >>>>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr Our > >>>>>>>>>>>>>>>>>>>>> versions are newer. > ourLowThreshold=1431233788792274944 > >>>>>>>>>>>>>>>>>>>>> otherHigh=1431233789440294912 > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM > org.apache.solr.update.PeerSync > >>>>> sync > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 > >>>>>>>>>>>>>>>>>>>>> url=http://10.38.33.17:7577/solrDONE. sync succeeded > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> which again seems to point that it thinks it has a > >>> newer > >>>>>>>>>>>> version of > >>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>> index so it aborts. This happened while having 10 > >>> threads > >>>>>>>>>>>> indexing > >>>>>>>>>>>>>>>>>>>> 10,000 > >>>>>>>>>>>>>>>>>>>>> items writing to a 6 shard (1 replica each) cluster. > >>> Any > >>>>>>>>>>>> thoughts > >>>>>>>>>>>>>>> on > >>>>>>>>>>>>>>>>>>>> this > >>>>>>>>>>>>>>>>>>>>> or what I should look for would be appreciated. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>>> > >>>>> > >>>>> > >>> > >>> > >> > >