Something interesting that I'm noticing as well, I just indexed 300,000
items, and some how 300,020 ended up in the index.  I thought perhaps I
messed something up so I started the indexing again and indexed another
400,000 and I see 400,064 docs.  Is there a good way to find possibile
duplicates?  I had tried to facet on key (our id field) but that didn't
give me anything with more than a count of 1.


On Wed, Apr 3, 2013 at 9:22 AM, Jamie Johnson <jej2...@gmail.com> wrote:

> Ok, so clearing the transaction log allowed things to go again.  I am
> going to clear the index and try to replicate the problem on 4.2.0 and then
> I'll try on 4.2.1
>
>
> On Wed, Apr 3, 2013 at 8:21 AM, Mark Miller <markrmil...@gmail.com> wrote:
>
>> No, not that I know if, which is why I say we need to get to the bottom
>> of it.
>>
>> - Mark
>>
>> On Apr 2, 2013, at 10:18 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>>
>> > Mark
>> > It's there a particular jira issue that you think may address this? I
>> read
>> > through it quickly but didn't see one that jumped out
>> > On Apr 2, 2013 10:07 PM, "Jamie Johnson" <jej2...@gmail.com> wrote:
>> >
>> >> I brought the bad one down and back up and it did nothing.  I can clear
>> >> the index and try4.2.1. I will save off the logs and see if there is
>> >> anything else odd
>> >> On Apr 2, 2013 9:13 PM, "Mark Miller" <markrmil...@gmail.com> wrote:
>> >>
>> >>> It would appear it's a bug given what you have said.
>> >>>
>> >>> Any other exceptions would be useful. Might be best to start tracking
>> in
>> >>> a JIRA issue as well.
>> >>>
>> >>> To fix, I'd bring the behind node down and back again.
>> >>>
>> >>> Unfortunately, I'm pressed for time, but we really need to get to the
>> >>> bottom of this and fix it, or determine if it's fixed in 4.2.1
>> (spreading
>> >>> to mirrors now).
>> >>>
>> >>> - Mark
>> >>>
>> >>> On Apr 2, 2013, at 7:21 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>> >>>
>> >>>> Sorry I didn't ask the obvious question.  Is there anything else
>> that I
>> >>>> should be looking for here and is this a bug?  I'd be happy to troll
>> >>>> through the logs further if more information is needed, just let me
>> >>> know.
>> >>>>
>> >>>> Also what is the most appropriate mechanism to fix this.  Is it
>> >>> required to
>> >>>> kill the index that is out of sync and let solr resync things?
>> >>>>
>> >>>>
>> >>>> On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson <jej2...@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>>> sorry for spamming here....
>> >>>>>
>> >>>>> shard5-core2 is the instance we're having issues with...
>> >>>>>
>> >>>>> Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
>> >>>>> SEVERE: shard update error StdNode:
>> >>>>>
>> >>>
>> http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException
>> >>> :
>> >>>>> Server at http://10.38.33.17:7577/solr/dsc-shard5-core2 returned
>> non
>> >>> ok
>> >>>>> status:503, message:Service Unavailable
>> >>>>>       at
>> >>>>>
>> >>>
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
>> >>>>>       at
>> >>>>>
>> >>>
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>> >>>>>       at
>> >>>>>
>> >>>
>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332)
>> >>>>>       at
>> >>>>>
>> >>>
>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
>> >>>>>       at
>> >>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> >>>>>       at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> >>>>>       at
>> >>>>>
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>> >>>>>       at
>> >>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> >>>>>       at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> >>>>>       at
>> >>>>>
>> >>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> >>>>>       at
>> >>>>>
>> >>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> >>>>>       at java.lang.Thread.run(Thread.java:662)
>> >>>>>
>> >>>>>
>> >>>>> On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson <jej2...@gmail.com>
>> >>> wrote:
>> >>>>>
>> >>>>>> here is another one that looks interesting
>> >>>>>>
>> >>>>>> Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
>> >>>>>> SEVERE: org.apache.solr.common.SolrException: ClusterState says we
>> are
>> >>>>>> the leader, but locally we don't think so
>> >>>>>>       at
>> >>>>>>
>> >>>
>> org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293)
>> >>>>>>       at
>> >>>>>>
>> >>>
>> org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228)
>> >>>>>>       at
>> >>>>>>
>> >>>
>> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339)
>> >>>>>>       at
>> >>>>>>
>> >>>
>> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
>> >>>>>>       at
>> >>>>>>
>> >>>
>> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
>> >>>>>>       at
>> >>>>>> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
>> >>>>>>       at
>> >>>>>>
>> >>>
>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>> >>>>>>       at
>> >>>>>>
>> >>>
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>> >>>>>>       at
>> >>>>>>
>> >>>
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>> >>>>>>       at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
>> >>>>>>       at
>> >>>>>>
>> >>>
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
>> >>>>>>       at
>> >>>>>>
>> >>>
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson <jej2...@gmail.com>
>> >>> wrote:
>> >>>>>>
>> >>>>>>> Looking at the master it looks like at some point there were
>> shards
>> >>> that
>> >>>>>>> went down.  I am seeing things like what is below.
>> >>>>>>>
>> >>>>>>> NFO: A cluster state change: WatchedEvent state:SyncConnected
>> >>>>>>> type:NodeChildrenChanged path:/live_nodes, has occurred -
>> >>> updating... (live
>> >>>>>>> nodes size: 12)
>> >>>>>>> Apr 2, 2013 8:12:52 PM
>> org.apache.solr.common.cloud.ZkStateReader$3
>> >>>>>>> process
>> >>>>>>> INFO: Updating live nodes... (9)
>> >>>>>>> Apr 2, 2013 8:12:52 PM
>> >>> org.apache.solr.cloud.ShardLeaderElectionContext
>> >>>>>>> runLeaderProcess
>> >>>>>>> INFO: Running the leader process.
>> >>>>>>> Apr 2, 2013 8:12:52 PM
>> >>> org.apache.solr.cloud.ShardLeaderElectionContext
>> >>>>>>> shouldIBeLeader
>> >>>>>>> INFO: Checking if I should try and be the leader.
>> >>>>>>> Apr 2, 2013 8:12:52 PM
>> >>> org.apache.solr.cloud.ShardLeaderElectionContext
>> >>>>>>> shouldIBeLeader
>> >>>>>>> INFO: My last published State was Active, it's okay to be the
>> leader.
>> >>>>>>> Apr 2, 2013 8:12:52 PM
>> >>> org.apache.solr.cloud.ShardLeaderElectionContext
>> >>>>>>> runLeaderProcess
>> >>>>>>> INFO: I may be the new leader - try and sync
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller <
>> markrmil...@gmail.com
>> >>>> wrote:
>> >>>>>>>
>> >>>>>>>> I don't think the versions you are thinking of apply here.
>> Peersync
>> >>>>>>>> does not look at that - it looks at version numbers for updates
>> in
>> >>> the
>> >>>>>>>> transaction log - it compares the last 100 of them on leader and
>> >>> replica.
>> >>>>>>>> What it's saying is that the replica seems to have versions that
>> >>> the leader
>> >>>>>>>> does not. Have you scanned the logs for any interesting
>> exceptions?
>> >>>>>>>>
>> >>>>>>>> Did the leader change during the heavy indexing? Did any zk
>> session
>> >>>>>>>> timeouts occur?
>> >>>>>>>>
>> >>>>>>>> - Mark
>> >>>>>>>>
>> >>>>>>>> On Apr 2, 2013, at 4:52 PM, Jamie Johnson <jej2...@gmail.com>
>> >>> wrote:
>> >>>>>>>>
>> >>>>>>>>> I am currently looking at moving our Solr cluster to 4.2 and
>> >>> noticed a
>> >>>>>>>>> strange issue while testing today.  Specifically the replica
>> has a
>> >>>>>>>> higher
>> >>>>>>>>> version than the master which is causing the index to not
>> >>> replicate.
>> >>>>>>>>> Because of this the replica has fewer documents than the master.
>> >>> What
>> >>>>>>>>> could cause this and how can I resolve it short of taking down
>> the
>> >>>>>>>> index
>> >>>>>>>>> and scping the right version in?
>> >>>>>>>>>
>> >>>>>>>>> MASTER:
>> >>>>>>>>> Last Modified:about an hour ago
>> >>>>>>>>> Num Docs:164880
>> >>>>>>>>> Max Doc:164880
>> >>>>>>>>> Deleted Docs:0
>> >>>>>>>>> Version:2387
>> >>>>>>>>> Segment Count:23
>> >>>>>>>>>
>> >>>>>>>>> REPLICA:
>> >>>>>>>>> Last Modified: about an hour ago
>> >>>>>>>>> Num Docs:164773
>> >>>>>>>>> Max Doc:164773
>> >>>>>>>>> Deleted Docs:0
>> >>>>>>>>> Version:3001
>> >>>>>>>>> Segment Count:30
>> >>>>>>>>>
>> >>>>>>>>> in the replicas log it says this:
>> >>>>>>>>>
>> >>>>>>>>> INFO: Creating new http client,
>> >>>>>>>>>
>> >>>>>>>>
>> >>>
>> config:maxConnectionsPerHost=20&maxConnections=10000&connTimeout=30000&socketTimeout=30000&retry=false
>> >>>>>>>>>
>> >>>>>>>>> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync
>> >>>>>>>>>
>> >>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2
>> >>>>>>>>> url=http://10.38.33.17:7577/solrSTART replicas=[
>> >>>>>>>>> http://10.38.33.16:7575/solr/dsc-shard5-core1/] nUpdates=100
>> >>>>>>>>>
>> >>>>>>>>> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync
>> >>> handleVersions
>> >>>>>>>>>
>> >>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url=
>> >>>>>>>> http://10.38.33.17:7577/solr
>> >>>>>>>>> Received 100 versions from
>> 10.38.33.16:7575/solr/dsc-shard5-core1/
>> >>>>>>>>>
>> >>>>>>>>> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync
>> >>> handleVersions
>> >>>>>>>>>
>> >>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url=
>> >>>>>>>> http://10.38.33.17:7577/solr  Our
>> >>>>>>>>> versions are newer. ourLowThreshold=1431233788792274944
>> >>>>>>>>> otherHigh=1431233789440294912
>> >>>>>>>>>
>> >>>>>>>>> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync
>> >>>>>>>>>
>> >>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2
>> >>>>>>>>> url=http://10.38.33.17:7577/solrDONE. sync succeeded
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> which again seems to point that it thinks it has a newer
>> version of
>> >>>>>>>> the
>> >>>>>>>>> index so it aborts.  This happened while having 10 threads
>> indexing
>> >>>>>>>> 10,000
>> >>>>>>>>> items writing to a 6 shard (1 replica each) cluster.  Any
>> thoughts
>> >>> on
>> >>>>>>>> this
>> >>>>>>>>> or what I should look for would be appreciated.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>
>> >>>
>>
>>
>

Reply via email to