Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

Mark Miller Wed, 03 Apr 2013 12:57:22 -0700

I had thought you could - but looking at the code recently, I don't think you 
can anymore. I think that's a technical limitation more than anything though. 
When these changes were made, I think support for that was simply not added at 
the time.


I'm not sure exactly how straightforward it would be, but it seems doable - as 
it is, the overseer will preallocate shards when first creating the collection 
- that's when they get named shard(n). There would have to be logic to replace 
shard(n) with the custom shard name when the core actually registers.

- Mark

On Apr 3, 2013, at 3:42 PM, Jamie Johnson <jej2...@gmail.com> wrote:

> answered my own question, it now says compositeId.  What is problematic
> though is that in addition to my shards (which are say jamie-shard1) I see
> the solr created shards (shard1).  I assume that these were created because
> of the numShards param.  Is there no way to specify the names of these
> shards?
> 
> 
> On Wed, Apr 3, 2013 at 3:25 PM, Jamie Johnson <jej2...@gmail.com> wrote:
> 
>> ah interesting....so I need to specify num shards, blow out zk and then
>> try this again to see if things work properly now.  What is really strange
>> is that for the most part things have worked right and on 4.2.1 I have
>> 600,000 items indexed with no duplicates.  In any event I will specify num
>> shards clear out zk and begin again.  If this works properly what should
>> the router type be?
>> 
>> 
>> On Wed, Apr 3, 2013 at 3:14 PM, Mark Miller <markrmil...@gmail.com> wrote:
>> 
>>> If you don't specify numShards after 4.1, you get an implicit doc router
>>> and it's up to you to distribute updates. In the past, partitioning was
>>> done on the fly - but for shard splitting and perhaps other features, we
>>> now divvy up the hash range up front based on numShards and store it in
>>> ZooKeeper. No numShards is now how you take complete control of updates
>>> yourself.
>>> 
>>> - Mark
>>> 
>>> On Apr 3, 2013, at 2:57 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>>> 
>>>> The router says "implicit".  I did start from a blank zk state but
>>> perhaps
>>>> I missed one of the ZkCLI commands?  One of my shards from the
>>>> clusterstate.json is shown below.  What is the process that should be
>>> done
>>>> to bootstrap a cluster other than the ZkCLI commands I listed above?  My
>>>> process right now is run those ZkCLI commands and then start solr on
>>> all of
>>>> the instances with a command like this
>>>> 
>>>> java -server -Dshard=shard5 -DcoreName=shard5-core1
>>>> -Dsolr.data.dir=/solr/data/shard5-core1
>>> -Dcollection.configName=solr-conf
>>>> -Dcollection=collection1 -DzkHost=so-zoo1:2181,so-zoo2:2181,so-zoo3:2181
>>>> -Djetty.port=7575 -DhostPort=7575 -jar start.jar
>>>> 
>>>> I feel like maybe I'm missing a step.
>>>> 
>>>> "shard5":{
>>>>       "state":"active",
>>>>       "replicas":{
>>>>         "10.38.33.16:7575_solr_shard5-core1":{
>>>>           "shard":"shard5",
>>>>           "state":"active",
>>>>           "core":"shard5-core1",
>>>>           "collection":"collection1",
>>>>           "node_name":"10.38.33.16:7575_solr",
>>>>           "base_url":"http://10.38.33.16:7575/solr";,
>>>>           "leader":"true"},
>>>>         "10.38.33.17:7577_solr_shard5-core2":{
>>>>           "shard":"shard5",
>>>>           "state":"recovering",
>>>>           "core":"shard5-core2",
>>>>           "collection":"collection1",
>>>>           "node_name":"10.38.33.17:7577_solr",
>>>>           "base_url":"http://10.38.33.17:7577/solr"}}}
>>>> 
>>>> 
>>>> On Wed, Apr 3, 2013 at 2:40 PM, Mark Miller <markrmil...@gmail.com>
>>> wrote:
>>>> 
>>>>> It should be part of your clusterstate.json. Some users have reported
>>>>> trouble upgrading a previous zk install when this change came. I
>>>>> recommended manually updating the clusterstate.json to have the right
>>> info,
>>>>> and that seemed to work. Otherwise, I guess you have to start from a
>>> clean
>>>>> zk state.
>>>>> 
>>>>> If you don't have that range information, I think there will be
>>> trouble.
>>>>> Do you have an router type defined in the clusterstate.json?
>>>>> 
>>>>> - Mark
>>>>> 
>>>>> On Apr 3, 2013, at 2:24 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>>>>> 
>>>>>> Where is this information stored in ZK?  I don't see it in the cluster
>>>>>> state (or perhaps I don't understand it ;) ).
>>>>>> 
>>>>>> Perhaps something with my process is broken.  What I do when I start
>>> from
>>>>>> scratch is the following
>>>>>> 
>>>>>> ZkCLI -cmd upconfig ...
>>>>>> ZkCLI -cmd linkconfig ....
>>>>>> 
>>>>>> but I don't ever explicitly create the collection.  What should the
>>> steps
>>>>>> from scratch be?  I am moving from an unreleased snapshot of 4.0 so I
>>>>> never
>>>>>> did that previously either so perhaps I did create the collection in
>>> one
>>>>> of
>>>>>> my steps to get this working but have forgotten it along the way.
>>>>>> 
>>>>>> 
>>>>>> On Wed, Apr 3, 2013 at 2:16 PM, Mark Miller <markrmil...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>>> Thanks for digging Jamie. In 4.2, hash ranges are assigned up front
>>>>> when a
>>>>>>> collection is created - each shard gets a range, which is stored in
>>>>>>> zookeeper. You should not be able to end up with the same id on
>>>>> different
>>>>>>> shards - something very odd going on.
>>>>>>> 
>>>>>>> Hopefully I'll have some time to try and help you reproduce. Ideally
>>> we
>>>>>>> can capture it in a test case.
>>>>>>> 
>>>>>>> - Mark
>>>>>>> 
>>>>>>> On Apr 3, 2013, at 1:13 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>>>>>>> 
>>>>>>>> no, my thought was wrong, it appears that even with the parameter
>>> set I
>>>>>>> am
>>>>>>>> seeing this behavior.  I've been able to duplicate it on 4.2.0 by
>>>>>>> indexing
>>>>>>>> 100,000 documents on 10 threads (10,000 each) when I get to 400,000
>>> or
>>>>>>> so.
>>>>>>>> I will try this on 4.2.1. to see if I see the same behavior
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Apr 3, 2013 at 12:37 PM, Jamie Johnson <jej2...@gmail.com>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Since I don't have that many items in my index I exported all of
>>> the
>>>>>>> keys
>>>>>>>>> for each shard and wrote a simple java program that checks for
>>>>>>> duplicates.
>>>>>>>>> I found some duplicate keys on different shards, a grep of the
>>> files
>>>>> for
>>>>>>>>> the keys found does indicate that they made it to the wrong places.
>>>>> If
>>>>>>> you
>>>>>>>>> notice documents with the same ID are on shard 3 and shard 5.  Is
>>> it
>>>>>>>>> possible that the hash is being calculated taking into account only
>>>>> the
>>>>>>>>> "live" nodes?  I know that we don't specify the numShards param @
>>>>>>> startup
>>>>>>>>> so could this be what is happening?
>>>>>>>>> 
>>>>>>>>> grep -c "7cd1a717-3d94-4f5d-bcb1-9d8a95ca78de" *
>>>>>>>>> shard1-core1:0
>>>>>>>>> shard1-core2:0
>>>>>>>>> shard2-core1:0
>>>>>>>>> shard2-core2:0
>>>>>>>>> shard3-core1:1
>>>>>>>>> shard3-core2:1
>>>>>>>>> shard4-core1:0
>>>>>>>>> shard4-core2:0
>>>>>>>>> shard5-core1:1
>>>>>>>>> shard5-core2:1
>>>>>>>>> shard6-core1:0
>>>>>>>>> shard6-core2:0
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Apr 3, 2013 at 10:42 AM, Jamie Johnson <jej2...@gmail.com>
>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Something interesting that I'm noticing as well, I just indexed
>>>>> 300,000
>>>>>>>>>> items, and some how 300,020 ended up in the index.  I thought
>>>>> perhaps I
>>>>>>>>>> messed something up so I started the indexing again and indexed
>>>>> another
>>>>>>>>>> 400,000 and I see 400,064 docs.  Is there a good way to find
>>>>> possibile
>>>>>>>>>> duplicates?  I had tried to facet on key (our id field) but that
>>>>> didn't
>>>>>>>>>> give me anything with more than a count of 1.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Wed, Apr 3, 2013 at 9:22 AM, Jamie Johnson <jej2...@gmail.com>
>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Ok, so clearing the transaction log allowed things to go again.
>>> I
>>>>> am
>>>>>>>>>>> going to clear the index and try to replicate the problem on
>>> 4.2.0
>>>>>>> and then
>>>>>>>>>>> I'll try on 4.2.1
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Apr 3, 2013 at 8:21 AM, Mark Miller <
>>> markrmil...@gmail.com
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> No, not that I know if, which is why I say we need to get to the
>>>>>>> bottom
>>>>>>>>>>>> of it.
>>>>>>>>>>>> 
>>>>>>>>>>>> - Mark
>>>>>>>>>>>> 
>>>>>>>>>>>> On Apr 2, 2013, at 10:18 PM, Jamie Johnson <jej2...@gmail.com>
>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Mark
>>>>>>>>>>>>> It's there a particular jira issue that you think may address
>>>>> this?
>>>>>>> I
>>>>>>>>>>>> read
>>>>>>>>>>>>> through it quickly but didn't see one that jumped out
>>>>>>>>>>>>> On Apr 2, 2013 10:07 PM, "Jamie Johnson" <jej2...@gmail.com>
>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I brought the bad one down and back up and it did nothing.  I
>>> can
>>>>>>>>>>>> clear
>>>>>>>>>>>>>> the index and try4.2.1. I will save off the logs and see if
>>> there
>>>>>>> is
>>>>>>>>>>>>>> anything else odd
>>>>>>>>>>>>>> On Apr 2, 2013 9:13 PM, "Mark Miller" <markrmil...@gmail.com>
>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> It would appear it's a bug given what you have said.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Any other exceptions would be useful. Might be best to start
>>>>>>>>>>>> tracking in
>>>>>>>>>>>>>>> a JIRA issue as well.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> To fix, I'd bring the behind node down and back again.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Unfortunately, I'm pressed for time, but we really need to
>>> get
>>>>> to
>>>>>>>>>>>> the
>>>>>>>>>>>>>>> bottom of this and fix it, or determine if it's fixed in
>>> 4.2.1
>>>>>>>>>>>> (spreading
>>>>>>>>>>>>>>> to mirrors now).
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> - Mark
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Apr 2, 2013, at 7:21 PM, Jamie Johnson <jej2...@gmail.com
>>>> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Sorry I didn't ask the obvious question.  Is there anything
>>>>> else
>>>>>>>>>>>> that I
>>>>>>>>>>>>>>>> should be looking for here and is this a bug?  I'd be happy
>>> to
>>>>>>>>>>>> troll
>>>>>>>>>>>>>>>> through the logs further if more information is needed, just
>>>>> let
>>>>>>> me
>>>>>>>>>>>>>>> know.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Also what is the most appropriate mechanism to fix this.
>>> Is it
>>>>>>>>>>>>>>> required to
>>>>>>>>>>>>>>>> kill the index that is out of sync and let solr resync
>>> things?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson <
>>>>> jej2...@gmail.com
>>>>>>>> 
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> sorry for spamming here....
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> shard5-core2 is the instance we're having issues with...
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException
>>>>> log
>>>>>>>>>>>>>>>>> SEVERE: shard update error StdNode:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException
>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>>>> Server at
>>>>> http://10.38.33.17:7577/solr/dsc-shard5-core2returned
>>>>>>>>>>>> non
>>>>>>>>>>>>>>> ok
>>>>>>>>>>>>>>>>> status:503, message:Service Unavailable
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332)
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> 
>>>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>>>>>>>>>>>>>>>>   at
>>>>> java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>> 
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> 
>>>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>>>>>>>>>>>>>>>>   at
>>>>> java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>>>>>>>>>>>>>>>   at java.lang.Thread.run(Thread.java:662)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson <
>>>>>>> jej2...@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> here is another one that looks interesting
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM
>>> org.apache.solr.common.SolrException
>>>>> log
>>>>>>>>>>>>>>>>>> SEVERE: org.apache.solr.common.SolrException: ClusterState
>>>>> says
>>>>>>>>>>>> we are
>>>>>>>>>>>>>>>>>> the leader, but locally we don't think so
>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293)
>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228)
>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339)
>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>> 
>>>>>>> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>> org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson <
>>>>>>> jej2...@gmail.com
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Looking at the master it looks like at some point there
>>> were
>>>>>>>>>>>> shards
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> went down.  I am seeing things like what is below.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> NFO: A cluster state change: WatchedEvent
>>>>> state:SyncConnected
>>>>>>>>>>>>>>>>>>> type:NodeChildrenChanged path:/live_nodes, has occurred -
>>>>>>>>>>>>>>> updating... (live
>>>>>>>>>>>>>>>>>>> nodes size: 12)
>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
>>>>>>>>>>>> org.apache.solr.common.cloud.ZkStateReader$3
>>>>>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>>>>>> INFO: Updating live nodes... (9)
>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
>>>>>>>>>>>>>>>>>>> runLeaderProcess
>>>>>>>>>>>>>>>>>>> INFO: Running the leader process.
>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
>>>>>>>>>>>>>>>>>>> shouldIBeLeader
>>>>>>>>>>>>>>>>>>> INFO: Checking if I should try and be the leader.
>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
>>>>>>>>>>>>>>>>>>> shouldIBeLeader
>>>>>>>>>>>>>>>>>>> INFO: My last published State was Active, it's okay to be
>>>>> the
>>>>>>>>>>>> leader.
>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
>>>>>>>>>>>>>>>>>>> runLeaderProcess
>>>>>>>>>>>>>>>>>>> INFO: I may be the new leader - try and sync
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller <
>>>>>>>>>>>> markrmil...@gmail.com
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I don't think the versions you are thinking of apply
>>> here.
>>>>>>>>>>>> Peersync
>>>>>>>>>>>>>>>>>>>> does not look at that - it looks at version numbers for
>>>>>>>>>>>> updates in
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> transaction log - it compares the last 100 of them on
>>>>> leader
>>>>>>>>>>>> and
>>>>>>>>>>>>>>> replica.
>>>>>>>>>>>>>>>>>>>> What it's saying is that the replica seems to have
>>> versions
>>>>>>>>>>>> that
>>>>>>>>>>>>>>> the leader
>>>>>>>>>>>>>>>>>>>> does not. Have you scanned the logs for any interesting
>>>>>>>>>>>> exceptions?
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Did the leader change during the heavy indexing? Did
>>> any zk
>>>>>>>>>>>> session
>>>>>>>>>>>>>>>>>>>> timeouts occur?
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> - Mark
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 4:52 PM, Jamie Johnson <
>>>>> jej2...@gmail.com
>>>>>>>> 
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I am currently looking at moving our Solr cluster to
>>> 4.2
>>>>> and
>>>>>>>>>>>>>>> noticed a
>>>>>>>>>>>>>>>>>>>>> strange issue while testing today.  Specifically the
>>>>> replica
>>>>>>>>>>>> has a
>>>>>>>>>>>>>>>>>>>> higher
>>>>>>>>>>>>>>>>>>>>> version than the master which is causing the index to
>>> not
>>>>>>>>>>>>>>> replicate.
>>>>>>>>>>>>>>>>>>>>> Because of this the replica has fewer documents than
>>> the
>>>>>>>>>>>> master.
>>>>>>>>>>>>>>> What
>>>>>>>>>>>>>>>>>>>>> could cause this and how can I resolve it short of
>>> taking
>>>>>>>>>>>> down the
>>>>>>>>>>>>>>>>>>>> index
>>>>>>>>>>>>>>>>>>>>> and scping the right version in?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> MASTER:
>>>>>>>>>>>>>>>>>>>>> Last Modified:about an hour ago
>>>>>>>>>>>>>>>>>>>>> Num Docs:164880
>>>>>>>>>>>>>>>>>>>>> Max Doc:164880
>>>>>>>>>>>>>>>>>>>>> Deleted Docs:0
>>>>>>>>>>>>>>>>>>>>> Version:2387
>>>>>>>>>>>>>>>>>>>>> Segment Count:23
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> REPLICA:
>>>>>>>>>>>>>>>>>>>>> Last Modified: about an hour ago
>>>>>>>>>>>>>>>>>>>>> Num Docs:164773
>>>>>>>>>>>>>>>>>>>>> Max Doc:164773
>>>>>>>>>>>>>>>>>>>>> Deleted Docs:0
>>>>>>>>>>>>>>>>>>>>> Version:3001
>>>>>>>>>>>>>>>>>>>>> Segment Count:30
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> in the replicas log it says this:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> INFO: Creating new http client,
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> config:maxConnectionsPerHost=20&maxConnections=10000&connTimeout=30000&socketTimeout=30000&retry=false
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync
>>>>> sync
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2
>>>>>>>>>>>>>>>>>>>>> url=http://10.38.33.17:7577/solrSTART replicas=[
>>>>>>>>>>>>>>>>>>>>> http://10.38.33.16:7575/solr/dsc-shard5-core1/]
>>>>>>> nUpdates=100
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync
>>>>>>>>>>>>>>> handleVersions
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url=
>>>>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr
>>>>>>>>>>>>>>>>>>>>> Received 100 versions from
>>>>>>>>>>>> 10.38.33.16:7575/solr/dsc-shard5-core1/
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync
>>>>>>>>>>>>>>> handleVersions
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url=
>>>>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr  Our
>>>>>>>>>>>>>>>>>>>>> versions are newer. ourLowThreshold=1431233788792274944
>>>>>>>>>>>>>>>>>>>>> otherHigh=1431233789440294912
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync
>>>>> sync
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2
>>>>>>>>>>>>>>>>>>>>> url=http://10.38.33.17:7577/solrDONE. sync succeeded
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> which again seems to point that it thinks it has a
>>> newer
>>>>>>>>>>>> version of
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> index so it aborts.  This happened while having 10
>>> threads
>>>>>>>>>>>> indexing
>>>>>>>>>>>>>>>>>>>> 10,000
>>>>>>>>>>>>>>>>>>>>> items writing to a 6 shard (1 replica each) cluster.
>>> Any
>>>>>>>>>>>> thoughts
>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>> or what I should look for would be appreciated.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>>

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

Reply via email to