I had thought you could - but looking at the code recently, I don't think you can anymore. I think that's a technical limitation more than anything though. When these changes were made, I think support for that was simply not added at the time.
I'm not sure exactly how straightforward it would be, but it seems doable - as it is, the overseer will preallocate shards when first creating the collection - that's when they get named shard(n). There would have to be logic to replace shard(n) with the custom shard name when the core actually registers. - Mark On Apr 3, 2013, at 3:42 PM, Jamie Johnson <jej2...@gmail.com> wrote: > answered my own question, it now says compositeId. What is problematic > though is that in addition to my shards (which are say jamie-shard1) I see > the solr created shards (shard1). I assume that these were created because > of the numShards param. Is there no way to specify the names of these > shards? > > > On Wed, Apr 3, 2013 at 3:25 PM, Jamie Johnson <jej2...@gmail.com> wrote: > >> ah interesting....so I need to specify num shards, blow out zk and then >> try this again to see if things work properly now. What is really strange >> is that for the most part things have worked right and on 4.2.1 I have >> 600,000 items indexed with no duplicates. In any event I will specify num >> shards clear out zk and begin again. If this works properly what should >> the router type be? >> >> >> On Wed, Apr 3, 2013 at 3:14 PM, Mark Miller <markrmil...@gmail.com> wrote: >> >>> If you don't specify numShards after 4.1, you get an implicit doc router >>> and it's up to you to distribute updates. In the past, partitioning was >>> done on the fly - but for shard splitting and perhaps other features, we >>> now divvy up the hash range up front based on numShards and store it in >>> ZooKeeper. No numShards is now how you take complete control of updates >>> yourself. >>> >>> - Mark >>> >>> On Apr 3, 2013, at 2:57 PM, Jamie Johnson <jej2...@gmail.com> wrote: >>> >>>> The router says "implicit". I did start from a blank zk state but >>> perhaps >>>> I missed one of the ZkCLI commands? One of my shards from the >>>> clusterstate.json is shown below. What is the process that should be >>> done >>>> to bootstrap a cluster other than the ZkCLI commands I listed above? My >>>> process right now is run those ZkCLI commands and then start solr on >>> all of >>>> the instances with a command like this >>>> >>>> java -server -Dshard=shard5 -DcoreName=shard5-core1 >>>> -Dsolr.data.dir=/solr/data/shard5-core1 >>> -Dcollection.configName=solr-conf >>>> -Dcollection=collection1 -DzkHost=so-zoo1:2181,so-zoo2:2181,so-zoo3:2181 >>>> -Djetty.port=7575 -DhostPort=7575 -jar start.jar >>>> >>>> I feel like maybe I'm missing a step. >>>> >>>> "shard5":{ >>>> "state":"active", >>>> "replicas":{ >>>> "10.38.33.16:7575_solr_shard5-core1":{ >>>> "shard":"shard5", >>>> "state":"active", >>>> "core":"shard5-core1", >>>> "collection":"collection1", >>>> "node_name":"10.38.33.16:7575_solr", >>>> "base_url":"http://10.38.33.16:7575/solr", >>>> "leader":"true"}, >>>> "10.38.33.17:7577_solr_shard5-core2":{ >>>> "shard":"shard5", >>>> "state":"recovering", >>>> "core":"shard5-core2", >>>> "collection":"collection1", >>>> "node_name":"10.38.33.17:7577_solr", >>>> "base_url":"http://10.38.33.17:7577/solr"}}} >>>> >>>> >>>> On Wed, Apr 3, 2013 at 2:40 PM, Mark Miller <markrmil...@gmail.com> >>> wrote: >>>> >>>>> It should be part of your clusterstate.json. Some users have reported >>>>> trouble upgrading a previous zk install when this change came. I >>>>> recommended manually updating the clusterstate.json to have the right >>> info, >>>>> and that seemed to work. Otherwise, I guess you have to start from a >>> clean >>>>> zk state. >>>>> >>>>> If you don't have that range information, I think there will be >>> trouble. >>>>> Do you have an router type defined in the clusterstate.json? >>>>> >>>>> - Mark >>>>> >>>>> On Apr 3, 2013, at 2:24 PM, Jamie Johnson <jej2...@gmail.com> wrote: >>>>> >>>>>> Where is this information stored in ZK? I don't see it in the cluster >>>>>> state (or perhaps I don't understand it ;) ). >>>>>> >>>>>> Perhaps something with my process is broken. What I do when I start >>> from >>>>>> scratch is the following >>>>>> >>>>>> ZkCLI -cmd upconfig ... >>>>>> ZkCLI -cmd linkconfig .... >>>>>> >>>>>> but I don't ever explicitly create the collection. What should the >>> steps >>>>>> from scratch be? I am moving from an unreleased snapshot of 4.0 so I >>>>> never >>>>>> did that previously either so perhaps I did create the collection in >>> one >>>>> of >>>>>> my steps to get this working but have forgotten it along the way. >>>>>> >>>>>> >>>>>> On Wed, Apr 3, 2013 at 2:16 PM, Mark Miller <markrmil...@gmail.com> >>>>> wrote: >>>>>> >>>>>>> Thanks for digging Jamie. In 4.2, hash ranges are assigned up front >>>>> when a >>>>>>> collection is created - each shard gets a range, which is stored in >>>>>>> zookeeper. You should not be able to end up with the same id on >>>>> different >>>>>>> shards - something very odd going on. >>>>>>> >>>>>>> Hopefully I'll have some time to try and help you reproduce. Ideally >>> we >>>>>>> can capture it in a test case. >>>>>>> >>>>>>> - Mark >>>>>>> >>>>>>> On Apr 3, 2013, at 1:13 PM, Jamie Johnson <jej2...@gmail.com> wrote: >>>>>>> >>>>>>>> no, my thought was wrong, it appears that even with the parameter >>> set I >>>>>>> am >>>>>>>> seeing this behavior. I've been able to duplicate it on 4.2.0 by >>>>>>> indexing >>>>>>>> 100,000 documents on 10 threads (10,000 each) when I get to 400,000 >>> or >>>>>>> so. >>>>>>>> I will try this on 4.2.1. to see if I see the same behavior >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Apr 3, 2013 at 12:37 PM, Jamie Johnson <jej2...@gmail.com> >>>>>>> wrote: >>>>>>>> >>>>>>>>> Since I don't have that many items in my index I exported all of >>> the >>>>>>> keys >>>>>>>>> for each shard and wrote a simple java program that checks for >>>>>>> duplicates. >>>>>>>>> I found some duplicate keys on different shards, a grep of the >>> files >>>>> for >>>>>>>>> the keys found does indicate that they made it to the wrong places. >>>>> If >>>>>>> you >>>>>>>>> notice documents with the same ID are on shard 3 and shard 5. Is >>> it >>>>>>>>> possible that the hash is being calculated taking into account only >>>>> the >>>>>>>>> "live" nodes? I know that we don't specify the numShards param @ >>>>>>> startup >>>>>>>>> so could this be what is happening? >>>>>>>>> >>>>>>>>> grep -c "7cd1a717-3d94-4f5d-bcb1-9d8a95ca78de" * >>>>>>>>> shard1-core1:0 >>>>>>>>> shard1-core2:0 >>>>>>>>> shard2-core1:0 >>>>>>>>> shard2-core2:0 >>>>>>>>> shard3-core1:1 >>>>>>>>> shard3-core2:1 >>>>>>>>> shard4-core1:0 >>>>>>>>> shard4-core2:0 >>>>>>>>> shard5-core1:1 >>>>>>>>> shard5-core2:1 >>>>>>>>> shard6-core1:0 >>>>>>>>> shard6-core2:0 >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Apr 3, 2013 at 10:42 AM, Jamie Johnson <jej2...@gmail.com> >>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Something interesting that I'm noticing as well, I just indexed >>>>> 300,000 >>>>>>>>>> items, and some how 300,020 ended up in the index. I thought >>>>> perhaps I >>>>>>>>>> messed something up so I started the indexing again and indexed >>>>> another >>>>>>>>>> 400,000 and I see 400,064 docs. Is there a good way to find >>>>> possibile >>>>>>>>>> duplicates? I had tried to facet on key (our id field) but that >>>>> didn't >>>>>>>>>> give me anything with more than a count of 1. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Apr 3, 2013 at 9:22 AM, Jamie Johnson <jej2...@gmail.com> >>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Ok, so clearing the transaction log allowed things to go again. >>> I >>>>> am >>>>>>>>>>> going to clear the index and try to replicate the problem on >>> 4.2.0 >>>>>>> and then >>>>>>>>>>> I'll try on 4.2.1 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Apr 3, 2013 at 8:21 AM, Mark Miller < >>> markrmil...@gmail.com >>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> No, not that I know if, which is why I say we need to get to the >>>>>>> bottom >>>>>>>>>>>> of it. >>>>>>>>>>>> >>>>>>>>>>>> - Mark >>>>>>>>>>>> >>>>>>>>>>>> On Apr 2, 2013, at 10:18 PM, Jamie Johnson <jej2...@gmail.com> >>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Mark >>>>>>>>>>>>> It's there a particular jira issue that you think may address >>>>> this? >>>>>>> I >>>>>>>>>>>> read >>>>>>>>>>>>> through it quickly but didn't see one that jumped out >>>>>>>>>>>>> On Apr 2, 2013 10:07 PM, "Jamie Johnson" <jej2...@gmail.com> >>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I brought the bad one down and back up and it did nothing. I >>> can >>>>>>>>>>>> clear >>>>>>>>>>>>>> the index and try4.2.1. I will save off the logs and see if >>> there >>>>>>> is >>>>>>>>>>>>>> anything else odd >>>>>>>>>>>>>> On Apr 2, 2013 9:13 PM, "Mark Miller" <markrmil...@gmail.com> >>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> It would appear it's a bug given what you have said. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Any other exceptions would be useful. Might be best to start >>>>>>>>>>>> tracking in >>>>>>>>>>>>>>> a JIRA issue as well. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> To fix, I'd bring the behind node down and back again. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Unfortunately, I'm pressed for time, but we really need to >>> get >>>>> to >>>>>>>>>>>> the >>>>>>>>>>>>>>> bottom of this and fix it, or determine if it's fixed in >>> 4.2.1 >>>>>>>>>>>> (spreading >>>>>>>>>>>>>>> to mirrors now). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Mark >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Apr 2, 2013, at 7:21 PM, Jamie Johnson <jej2...@gmail.com >>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Sorry I didn't ask the obvious question. Is there anything >>>>> else >>>>>>>>>>>> that I >>>>>>>>>>>>>>>> should be looking for here and is this a bug? I'd be happy >>> to >>>>>>>>>>>> troll >>>>>>>>>>>>>>>> through the logs further if more information is needed, just >>>>> let >>>>>>> me >>>>>>>>>>>>>>> know. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Also what is the most appropriate mechanism to fix this. >>> Is it >>>>>>>>>>>>>>> required to >>>>>>>>>>>>>>>> kill the index that is out of sync and let solr resync >>> things? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson < >>>>> jej2...@gmail.com >>>>>>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> sorry for spamming here.... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> shard5-core2 is the instance we're having issues with... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException >>>>> log >>>>>>>>>>>>>>>>> SEVERE: shard update error StdNode: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>> >>>>> >>> http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException >>>>>>>>>>>>>>> : >>>>>>>>>>>>>>>>> Server at >>>>> http://10.38.33.17:7577/solr/dsc-shard5-core2returned >>>>>>>>>>>> non >>>>>>>>>>>>>>> ok >>>>>>>>>>>>>>>>> status:503, message:Service Unavailable >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>> >>>>> >>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>> >>>>> >>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>> >>>>> >>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>> >>>>> >>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> >>>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >>>>>>>>>>>>>>>>> at >>>>> java.util.concurrent.FutureTask.run(FutureTask.java:138) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>> >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> >>>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >>>>>>>>>>>>>>>>> at >>>>> java.util.concurrent.FutureTask.run(FutureTask.java:138) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>> >>>>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>> >>>>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >>>>>>>>>>>>>>>>> at java.lang.Thread.run(Thread.java:662) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson < >>>>>>> jej2...@gmail.com> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> here is another one that looks interesting >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM >>> org.apache.solr.common.SolrException >>>>> log >>>>>>>>>>>>>>>>>> SEVERE: org.apache.solr.common.SolrException: ClusterState >>>>> says >>>>>>>>>>>> we are >>>>>>>>>>>>>>>>>> the leader, but locally we don't think so >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>> >>>>> >>> org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>> >>>>> >>> org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>> >>>>> >>> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>> >>>>> >>> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>> >>>>> >>> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> >>>>>>> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>> >>>>> >>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>> >>>>> >>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>> >>>>> >>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>> org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>> >>>>> >>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>> >>>>> >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson < >>>>>>> jej2...@gmail.com >>>>>>>>>>>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Looking at the master it looks like at some point there >>> were >>>>>>>>>>>> shards >>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>> went down. I am seeing things like what is below. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> NFO: A cluster state change: WatchedEvent >>>>> state:SyncConnected >>>>>>>>>>>>>>>>>>> type:NodeChildrenChanged path:/live_nodes, has occurred - >>>>>>>>>>>>>>> updating... (live >>>>>>>>>>>>>>>>>>> nodes size: 12) >>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM >>>>>>>>>>>> org.apache.solr.common.cloud.ZkStateReader$3 >>>>>>>>>>>>>>>>>>> process >>>>>>>>>>>>>>>>>>> INFO: Updating live nodes... (9) >>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM >>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext >>>>>>>>>>>>>>>>>>> runLeaderProcess >>>>>>>>>>>>>>>>>>> INFO: Running the leader process. >>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM >>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext >>>>>>>>>>>>>>>>>>> shouldIBeLeader >>>>>>>>>>>>>>>>>>> INFO: Checking if I should try and be the leader. >>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM >>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext >>>>>>>>>>>>>>>>>>> shouldIBeLeader >>>>>>>>>>>>>>>>>>> INFO: My last published State was Active, it's okay to be >>>>> the >>>>>>>>>>>> leader. >>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM >>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext >>>>>>>>>>>>>>>>>>> runLeaderProcess >>>>>>>>>>>>>>>>>>> INFO: I may be the new leader - try and sync >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller < >>>>>>>>>>>> markrmil...@gmail.com >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I don't think the versions you are thinking of apply >>> here. >>>>>>>>>>>> Peersync >>>>>>>>>>>>>>>>>>>> does not look at that - it looks at version numbers for >>>>>>>>>>>> updates in >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>> transaction log - it compares the last 100 of them on >>>>> leader >>>>>>>>>>>> and >>>>>>>>>>>>>>> replica. >>>>>>>>>>>>>>>>>>>> What it's saying is that the replica seems to have >>> versions >>>>>>>>>>>> that >>>>>>>>>>>>>>> the leader >>>>>>>>>>>>>>>>>>>> does not. Have you scanned the logs for any interesting >>>>>>>>>>>> exceptions? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Did the leader change during the heavy indexing? Did >>> any zk >>>>>>>>>>>> session >>>>>>>>>>>>>>>>>>>> timeouts occur? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> - Mark >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 4:52 PM, Jamie Johnson < >>>>> jej2...@gmail.com >>>>>>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I am currently looking at moving our Solr cluster to >>> 4.2 >>>>> and >>>>>>>>>>>>>>> noticed a >>>>>>>>>>>>>>>>>>>>> strange issue while testing today. Specifically the >>>>> replica >>>>>>>>>>>> has a >>>>>>>>>>>>>>>>>>>> higher >>>>>>>>>>>>>>>>>>>>> version than the master which is causing the index to >>> not >>>>>>>>>>>>>>> replicate. >>>>>>>>>>>>>>>>>>>>> Because of this the replica has fewer documents than >>> the >>>>>>>>>>>> master. >>>>>>>>>>>>>>> What >>>>>>>>>>>>>>>>>>>>> could cause this and how can I resolve it short of >>> taking >>>>>>>>>>>> down the >>>>>>>>>>>>>>>>>>>> index >>>>>>>>>>>>>>>>>>>>> and scping the right version in? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> MASTER: >>>>>>>>>>>>>>>>>>>>> Last Modified:about an hour ago >>>>>>>>>>>>>>>>>>>>> Num Docs:164880 >>>>>>>>>>>>>>>>>>>>> Max Doc:164880 >>>>>>>>>>>>>>>>>>>>> Deleted Docs:0 >>>>>>>>>>>>>>>>>>>>> Version:2387 >>>>>>>>>>>>>>>>>>>>> Segment Count:23 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> REPLICA: >>>>>>>>>>>>>>>>>>>>> Last Modified: about an hour ago >>>>>>>>>>>>>>>>>>>>> Num Docs:164773 >>>>>>>>>>>>>>>>>>>>> Max Doc:164773 >>>>>>>>>>>>>>>>>>>>> Deleted Docs:0 >>>>>>>>>>>>>>>>>>>>> Version:3001 >>>>>>>>>>>>>>>>>>>>> Segment Count:30 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> in the replicas log it says this: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> INFO: Creating new http client, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>> >>>>> >>> config:maxConnectionsPerHost=20&maxConnections=10000&connTimeout=30000&socketTimeout=30000&retry=false >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync >>>>> sync >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 >>>>>>>>>>>>>>>>>>>>> url=http://10.38.33.17:7577/solrSTART replicas=[ >>>>>>>>>>>>>>>>>>>>> http://10.38.33.16:7575/solr/dsc-shard5-core1/] >>>>>>> nUpdates=100 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync >>>>>>>>>>>>>>> handleVersions >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url= >>>>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr >>>>>>>>>>>>>>>>>>>>> Received 100 versions from >>>>>>>>>>>> 10.38.33.16:7575/solr/dsc-shard5-core1/ >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync >>>>>>>>>>>>>>> handleVersions >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url= >>>>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr Our >>>>>>>>>>>>>>>>>>>>> versions are newer. ourLowThreshold=1431233788792274944 >>>>>>>>>>>>>>>>>>>>> otherHigh=1431233789440294912 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync >>>>> sync >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 >>>>>>>>>>>>>>>>>>>>> url=http://10.38.33.17:7577/solrDONE. sync succeeded >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> which again seems to point that it thinks it has a >>> newer >>>>>>>>>>>> version of >>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>> index so it aborts. This happened while having 10 >>> threads >>>>>>>>>>>> indexing >>>>>>>>>>>>>>>>>>>> 10,000 >>>>>>>>>>>>>>>>>>>>> items writing to a 6 shard (1 replica each) cluster. >>> Any >>>>>>>>>>>> thoughts >>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>>> or what I should look for would be appreciated. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>> >>> >>