ok, so that's not a deal breaker for me.  I just changed it to match the
shards that are auto created and it looks like things are happy.  I'll go
ahead and try my test to see if I can get things out of sync.


On Wed, Apr 3, 2013 at 3:56 PM, Mark Miller <markrmil...@gmail.com> wrote:

> I had thought you could - but looking at the code recently, I don't think
> you can anymore. I think that's a technical limitation more than anything
> though. When these changes were made, I think support for that was simply
> not added at the time.
>
> I'm not sure exactly how straightforward it would be, but it seems doable
> - as it is, the overseer will preallocate shards when first creating the
> collection - that's when they get named shard(n). There would have to be
> logic to replace shard(n) with the custom shard name when the core actually
> registers.
>
> - Mark
>
> On Apr 3, 2013, at 3:42 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>
> > answered my own question, it now says compositeId.  What is problematic
> > though is that in addition to my shards (which are say jamie-shard1) I
> see
> > the solr created shards (shard1).  I assume that these were created
> because
> > of the numShards param.  Is there no way to specify the names of these
> > shards?
> >
> >
> > On Wed, Apr 3, 2013 at 3:25 PM, Jamie Johnson <jej2...@gmail.com> wrote:
> >
> >> ah interesting....so I need to specify num shards, blow out zk and then
> >> try this again to see if things work properly now.  What is really
> strange
> >> is that for the most part things have worked right and on 4.2.1 I have
> >> 600,000 items indexed with no duplicates.  In any event I will specify
> num
> >> shards clear out zk and begin again.  If this works properly what should
> >> the router type be?
> >>
> >>
> >> On Wed, Apr 3, 2013 at 3:14 PM, Mark Miller <markrmil...@gmail.com>
> wrote:
> >>
> >>> If you don't specify numShards after 4.1, you get an implicit doc
> router
> >>> and it's up to you to distribute updates. In the past, partitioning was
> >>> done on the fly - but for shard splitting and perhaps other features,
> we
> >>> now divvy up the hash range up front based on numShards and store it in
> >>> ZooKeeper. No numShards is now how you take complete control of updates
> >>> yourself.
> >>>
> >>> - Mark
> >>>
> >>> On Apr 3, 2013, at 2:57 PM, Jamie Johnson <jej2...@gmail.com> wrote:
> >>>
> >>>> The router says "implicit".  I did start from a blank zk state but
> >>> perhaps
> >>>> I missed one of the ZkCLI commands?  One of my shards from the
> >>>> clusterstate.json is shown below.  What is the process that should be
> >>> done
> >>>> to bootstrap a cluster other than the ZkCLI commands I listed above?
>  My
> >>>> process right now is run those ZkCLI commands and then start solr on
> >>> all of
> >>>> the instances with a command like this
> >>>>
> >>>> java -server -Dshard=shard5 -DcoreName=shard5-core1
> >>>> -Dsolr.data.dir=/solr/data/shard5-core1
> >>> -Dcollection.configName=solr-conf
> >>>> -Dcollection=collection1
> -DzkHost=so-zoo1:2181,so-zoo2:2181,so-zoo3:2181
> >>>> -Djetty.port=7575 -DhostPort=7575 -jar start.jar
> >>>>
> >>>> I feel like maybe I'm missing a step.
> >>>>
> >>>> "shard5":{
> >>>>       "state":"active",
> >>>>       "replicas":{
> >>>>         "10.38.33.16:7575_solr_shard5-core1":{
> >>>>           "shard":"shard5",
> >>>>           "state":"active",
> >>>>           "core":"shard5-core1",
> >>>>           "collection":"collection1",
> >>>>           "node_name":"10.38.33.16:7575_solr",
> >>>>           "base_url":"http://10.38.33.16:7575/solr";,
> >>>>           "leader":"true"},
> >>>>         "10.38.33.17:7577_solr_shard5-core2":{
> >>>>           "shard":"shard5",
> >>>>           "state":"recovering",
> >>>>           "core":"shard5-core2",
> >>>>           "collection":"collection1",
> >>>>           "node_name":"10.38.33.17:7577_solr",
> >>>>           "base_url":"http://10.38.33.17:7577/solr"}}}
> >>>>
> >>>>
> >>>> On Wed, Apr 3, 2013 at 2:40 PM, Mark Miller <markrmil...@gmail.com>
> >>> wrote:
> >>>>
> >>>>> It should be part of your clusterstate.json. Some users have reported
> >>>>> trouble upgrading a previous zk install when this change came. I
> >>>>> recommended manually updating the clusterstate.json to have the right
> >>> info,
> >>>>> and that seemed to work. Otherwise, I guess you have to start from a
> >>> clean
> >>>>> zk state.
> >>>>>
> >>>>> If you don't have that range information, I think there will be
> >>> trouble.
> >>>>> Do you have an router type defined in the clusterstate.json?
> >>>>>
> >>>>> - Mark
> >>>>>
> >>>>> On Apr 3, 2013, at 2:24 PM, Jamie Johnson <jej2...@gmail.com> wrote:
> >>>>>
> >>>>>> Where is this information stored in ZK?  I don't see it in the
> cluster
> >>>>>> state (or perhaps I don't understand it ;) ).
> >>>>>>
> >>>>>> Perhaps something with my process is broken.  What I do when I start
> >>> from
> >>>>>> scratch is the following
> >>>>>>
> >>>>>> ZkCLI -cmd upconfig ...
> >>>>>> ZkCLI -cmd linkconfig ....
> >>>>>>
> >>>>>> but I don't ever explicitly create the collection.  What should the
> >>> steps
> >>>>>> from scratch be?  I am moving from an unreleased snapshot of 4.0 so
> I
> >>>>> never
> >>>>>> did that previously either so perhaps I did create the collection in
> >>> one
> >>>>> of
> >>>>>> my steps to get this working but have forgotten it along the way.
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Apr 3, 2013 at 2:16 PM, Mark Miller <markrmil...@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>>> Thanks for digging Jamie. In 4.2, hash ranges are assigned up front
> >>>>> when a
> >>>>>>> collection is created - each shard gets a range, which is stored in
> >>>>>>> zookeeper. You should not be able to end up with the same id on
> >>>>> different
> >>>>>>> shards - something very odd going on.
> >>>>>>>
> >>>>>>> Hopefully I'll have some time to try and help you reproduce.
> Ideally
> >>> we
> >>>>>>> can capture it in a test case.
> >>>>>>>
> >>>>>>> - Mark
> >>>>>>>
> >>>>>>> On Apr 3, 2013, at 1:13 PM, Jamie Johnson <jej2...@gmail.com>
> wrote:
> >>>>>>>
> >>>>>>>> no, my thought was wrong, it appears that even with the parameter
> >>> set I
> >>>>>>> am
> >>>>>>>> seeing this behavior.  I've been able to duplicate it on 4.2.0 by
> >>>>>>> indexing
> >>>>>>>> 100,000 documents on 10 threads (10,000 each) when I get to
> 400,000
> >>> or
> >>>>>>> so.
> >>>>>>>> I will try this on 4.2.1. to see if I see the same behavior
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Wed, Apr 3, 2013 at 12:37 PM, Jamie Johnson <jej2...@gmail.com
> >
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Since I don't have that many items in my index I exported all of
> >>> the
> >>>>>>> keys
> >>>>>>>>> for each shard and wrote a simple java program that checks for
> >>>>>>> duplicates.
> >>>>>>>>> I found some duplicate keys on different shards, a grep of the
> >>> files
> >>>>> for
> >>>>>>>>> the keys found does indicate that they made it to the wrong
> places.
> >>>>> If
> >>>>>>> you
> >>>>>>>>> notice documents with the same ID are on shard 3 and shard 5.  Is
> >>> it
> >>>>>>>>> possible that the hash is being calculated taking into account
> only
> >>>>> the
> >>>>>>>>> "live" nodes?  I know that we don't specify the numShards param @
> >>>>>>> startup
> >>>>>>>>> so could this be what is happening?
> >>>>>>>>>
> >>>>>>>>> grep -c "7cd1a717-3d94-4f5d-bcb1-9d8a95ca78de" *
> >>>>>>>>> shard1-core1:0
> >>>>>>>>> shard1-core2:0
> >>>>>>>>> shard2-core1:0
> >>>>>>>>> shard2-core2:0
> >>>>>>>>> shard3-core1:1
> >>>>>>>>> shard3-core2:1
> >>>>>>>>> shard4-core1:0
> >>>>>>>>> shard4-core2:0
> >>>>>>>>> shard5-core1:1
> >>>>>>>>> shard5-core2:1
> >>>>>>>>> shard6-core1:0
> >>>>>>>>> shard6-core2:0
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Wed, Apr 3, 2013 at 10:42 AM, Jamie Johnson <
> jej2...@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Something interesting that I'm noticing as well, I just indexed
> >>>>> 300,000
> >>>>>>>>>> items, and some how 300,020 ended up in the index.  I thought
> >>>>> perhaps I
> >>>>>>>>>> messed something up so I started the indexing again and indexed
> >>>>> another
> >>>>>>>>>> 400,000 and I see 400,064 docs.  Is there a good way to find
> >>>>> possibile
> >>>>>>>>>> duplicates?  I had tried to facet on key (our id field) but that
> >>>>> didn't
> >>>>>>>>>> give me anything with more than a count of 1.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Apr 3, 2013 at 9:22 AM, Jamie Johnson <
> jej2...@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Ok, so clearing the transaction log allowed things to go again.
> >>> I
> >>>>> am
> >>>>>>>>>>> going to clear the index and try to replicate the problem on
> >>> 4.2.0
> >>>>>>> and then
> >>>>>>>>>>> I'll try on 4.2.1
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, Apr 3, 2013 at 8:21 AM, Mark Miller <
> >>> markrmil...@gmail.com
> >>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> No, not that I know if, which is why I say we need to get to
> the
> >>>>>>> bottom
> >>>>>>>>>>>> of it.
> >>>>>>>>>>>>
> >>>>>>>>>>>> - Mark
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Apr 2, 2013, at 10:18 PM, Jamie Johnson <jej2...@gmail.com
> >
> >>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Mark
> >>>>>>>>>>>>> It's there a particular jira issue that you think may address
> >>>>> this?
> >>>>>>> I
> >>>>>>>>>>>> read
> >>>>>>>>>>>>> through it quickly but didn't see one that jumped out
> >>>>>>>>>>>>> On Apr 2, 2013 10:07 PM, "Jamie Johnson" <jej2...@gmail.com>
> >>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> I brought the bad one down and back up and it did nothing.
>  I
> >>> can
> >>>>>>>>>>>> clear
> >>>>>>>>>>>>>> the index and try4.2.1. I will save off the logs and see if
> >>> there
> >>>>>>> is
> >>>>>>>>>>>>>> anything else odd
> >>>>>>>>>>>>>> On Apr 2, 2013 9:13 PM, "Mark Miller" <
> markrmil...@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> It would appear it's a bug given what you have said.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Any other exceptions would be useful. Might be best to
> start
> >>>>>>>>>>>> tracking in
> >>>>>>>>>>>>>>> a JIRA issue as well.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> To fix, I'd bring the behind node down and back again.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Unfortunately, I'm pressed for time, but we really need to
> >>> get
> >>>>> to
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>> bottom of this and fix it, or determine if it's fixed in
> >>> 4.2.1
> >>>>>>>>>>>> (spreading
> >>>>>>>>>>>>>>> to mirrors now).
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> - Mark
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Apr 2, 2013, at 7:21 PM, Jamie Johnson <
> jej2...@gmail.com
> >>>>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Sorry I didn't ask the obvious question.  Is there
> anything
> >>>>> else
> >>>>>>>>>>>> that I
> >>>>>>>>>>>>>>>> should be looking for here and is this a bug?  I'd be
> happy
> >>> to
> >>>>>>>>>>>> troll
> >>>>>>>>>>>>>>>> through the logs further if more information is needed,
> just
> >>>>> let
> >>>>>>> me
> >>>>>>>>>>>>>>> know.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Also what is the most appropriate mechanism to fix this.
> >>> Is it
> >>>>>>>>>>>>>>> required to
> >>>>>>>>>>>>>>>> kill the index that is out of sync and let solr resync
> >>> things?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson <
> >>>>> jej2...@gmail.com
> >>>>>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> sorry for spamming here....
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> shard5-core2 is the instance we're having issues with...
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM
> org.apache.solr.common.SolrException
> >>>>> log
> >>>>>>>>>>>>>>>>> SEVERE: shard update error StdNode:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException
> >>>>>>>>>>>>>>> :
> >>>>>>>>>>>>>>>>> Server at
> >>>>> http://10.38.33.17:7577/solr/dsc-shard5-core2returned
> >>>>>>>>>>>> non
> >>>>>>>>>>>>>>> ok
> >>>>>>>>>>>>>>>>> status:503, message:Service Unavailable
> >>>>>>>>>>>>>>>>>   at
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
> >>>>>>>>>>>>>>>>>   at
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> >>>>>>>>>>>>>>>>>   at
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332)
> >>>>>>>>>>>>>>>>>   at
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
> >>>>>>>>>>>>>>>>>   at
> >>>>>>>>>>>>>>>>>
> >>>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >>>>>>>>>>>>>>>>>   at
> >>>>> java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >>>>>>>>>>>>>>>>>   at
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> >>>>>>>>>>>>>>>>>   at
> >>>>>>>>>>>>>>>>>
> >>>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >>>>>>>>>>>>>>>>>   at
> >>>>> java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >>>>>>>>>>>>>>>>>   at
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >>>>>>>>>>>>>>>>>   at
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >>>>>>>>>>>>>>>>>   at java.lang.Thread.run(Thread.java:662)
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson <
> >>>>>>> jej2...@gmail.com>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> here is another one that looks interesting
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM
> >>> org.apache.solr.common.SolrException
> >>>>> log
> >>>>>>>>>>>>>>>>>> SEVERE: org.apache.solr.common.SolrException:
> ClusterState
> >>>>> says
> >>>>>>>>>>>> we are
> >>>>>>>>>>>>>>>>>> the leader, but locally we don't think so
> >>>>>>>>>>>>>>>>>>   at
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293)
> >>>>>>>>>>>>>>>>>>   at
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228)
> >>>>>>>>>>>>>>>>>>   at
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339)
> >>>>>>>>>>>>>>>>>>   at
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
> >>>>>>>>>>>>>>>>>>   at
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
> >>>>>>>>>>>>>>>>>>   at
> >>>>>>>>>>>>>>>>>>
> >>>>>>> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
> >>>>>>>>>>>>>>>>>>   at
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> >>>>>>>>>>>>>>>>>>   at
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> >>>>>>>>>>>>>>>>>>   at
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> >>>>>>>>>>>>>>>>>>   at
> >>>>>>>>>>>> org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
> >>>>>>>>>>>>>>>>>>   at
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
> >>>>>>>>>>>>>>>>>>   at
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson <
> >>>>>>> jej2...@gmail.com
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Looking at the master it looks like at some point there
> >>> were
> >>>>>>>>>>>> shards
> >>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>> went down.  I am seeing things like what is below.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> NFO: A cluster state change: WatchedEvent
> >>>>> state:SyncConnected
> >>>>>>>>>>>>>>>>>>> type:NodeChildrenChanged path:/live_nodes, has
> occurred -
> >>>>>>>>>>>>>>> updating... (live
> >>>>>>>>>>>>>>>>>>> nodes size: 12)
> >>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
> >>>>>>>>>>>> org.apache.solr.common.cloud.ZkStateReader$3
> >>>>>>>>>>>>>>>>>>> process
> >>>>>>>>>>>>>>>>>>> INFO: Updating live nodes... (9)
> >>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
> >>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
> >>>>>>>>>>>>>>>>>>> runLeaderProcess
> >>>>>>>>>>>>>>>>>>> INFO: Running the leader process.
> >>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
> >>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
> >>>>>>>>>>>>>>>>>>> shouldIBeLeader
> >>>>>>>>>>>>>>>>>>> INFO: Checking if I should try and be the leader.
> >>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
> >>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
> >>>>>>>>>>>>>>>>>>> shouldIBeLeader
> >>>>>>>>>>>>>>>>>>> INFO: My last published State was Active, it's okay to
> be
> >>>>> the
> >>>>>>>>>>>> leader.
> >>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
> >>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
> >>>>>>>>>>>>>>>>>>> runLeaderProcess
> >>>>>>>>>>>>>>>>>>> INFO: I may be the new leader - try and sync
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller <
> >>>>>>>>>>>> markrmil...@gmail.com
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> I don't think the versions you are thinking of apply
> >>> here.
> >>>>>>>>>>>> Peersync
> >>>>>>>>>>>>>>>>>>>> does not look at that - it looks at version numbers
> for
> >>>>>>>>>>>> updates in
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> transaction log - it compares the last 100 of them on
> >>>>> leader
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>>> replica.
> >>>>>>>>>>>>>>>>>>>> What it's saying is that the replica seems to have
> >>> versions
> >>>>>>>>>>>> that
> >>>>>>>>>>>>>>> the leader
> >>>>>>>>>>>>>>>>>>>> does not. Have you scanned the logs for any
> interesting
> >>>>>>>>>>>> exceptions?
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Did the leader change during the heavy indexing? Did
> >>> any zk
> >>>>>>>>>>>> session
> >>>>>>>>>>>>>>>>>>>> timeouts occur?
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> - Mark
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 4:52 PM, Jamie Johnson <
> >>>>> jej2...@gmail.com
> >>>>>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I am currently looking at moving our Solr cluster to
> >>> 4.2
> >>>>> and
> >>>>>>>>>>>>>>> noticed a
> >>>>>>>>>>>>>>>>>>>>> strange issue while testing today.  Specifically the
> >>>>> replica
> >>>>>>>>>>>> has a
> >>>>>>>>>>>>>>>>>>>> higher
> >>>>>>>>>>>>>>>>>>>>> version than the master which is causing the index to
> >>> not
> >>>>>>>>>>>>>>> replicate.
> >>>>>>>>>>>>>>>>>>>>> Because of this the replica has fewer documents than
> >>> the
> >>>>>>>>>>>> master.
> >>>>>>>>>>>>>>> What
> >>>>>>>>>>>>>>>>>>>>> could cause this and how can I resolve it short of
> >>> taking
> >>>>>>>>>>>> down the
> >>>>>>>>>>>>>>>>>>>> index
> >>>>>>>>>>>>>>>>>>>>> and scping the right version in?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> MASTER:
> >>>>>>>>>>>>>>>>>>>>> Last Modified:about an hour ago
> >>>>>>>>>>>>>>>>>>>>> Num Docs:164880
> >>>>>>>>>>>>>>>>>>>>> Max Doc:164880
> >>>>>>>>>>>>>>>>>>>>> Deleted Docs:0
> >>>>>>>>>>>>>>>>>>>>> Version:2387
> >>>>>>>>>>>>>>>>>>>>> Segment Count:23
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> REPLICA:
> >>>>>>>>>>>>>>>>>>>>> Last Modified: about an hour ago
> >>>>>>>>>>>>>>>>>>>>> Num Docs:164773
> >>>>>>>>>>>>>>>>>>>>> Max Doc:164773
> >>>>>>>>>>>>>>>>>>>>> Deleted Docs:0
> >>>>>>>>>>>>>>>>>>>>> Version:3001
> >>>>>>>>>>>>>>>>>>>>> Segment Count:30
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> in the replicas log it says this:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> INFO: Creating new http client,
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> config:maxConnectionsPerHost=20&maxConnections=10000&connTimeout=30000&socketTimeout=30000&retry=false
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM
> org.apache.solr.update.PeerSync
> >>>>> sync
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2
> >>>>>>>>>>>>>>>>>>>>> url=http://10.38.33.17:7577/solrSTART replicas=[
> >>>>>>>>>>>>>>>>>>>>> http://10.38.33.16:7575/solr/dsc-shard5-core1/]
> >>>>>>> nUpdates=100
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM
> org.apache.solr.update.PeerSync
> >>>>>>>>>>>>>>> handleVersions
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url=
> >>>>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr
> >>>>>>>>>>>>>>>>>>>>> Received 100 versions from
> >>>>>>>>>>>> 10.38.33.16:7575/solr/dsc-shard5-core1/
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM
> org.apache.solr.update.PeerSync
> >>>>>>>>>>>>>>> handleVersions
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url=
> >>>>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr  Our
> >>>>>>>>>>>>>>>>>>>>> versions are newer.
> ourLowThreshold=1431233788792274944
> >>>>>>>>>>>>>>>>>>>>> otherHigh=1431233789440294912
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM
> org.apache.solr.update.PeerSync
> >>>>> sync
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2
> >>>>>>>>>>>>>>>>>>>>> url=http://10.38.33.17:7577/solrDONE. sync succeeded
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> which again seems to point that it thinks it has a
> >>> newer
> >>>>>>>>>>>> version of
> >>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> index so it aborts.  This happened while having 10
> >>> threads
> >>>>>>>>>>>> indexing
> >>>>>>>>>>>>>>>>>>>> 10,000
> >>>>>>>>>>>>>>>>>>>>> items writing to a 6 shard (1 replica each) cluster.
> >>> Any
> >>>>>>>>>>>> thoughts
> >>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>> or what I should look for would be appreciated.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>
> >>>
> >>
>
>

Reply via email to