Is that file still there when you look? Not being able to find an index file is not a common error I've seen recently.
Do those replicas have an index directory or when you look on disk, is it an index.timestamp directory? - Mark On Apr 3, 2013, at 10:01 PM, Jamie Johnson <jej2...@gmail.com> wrote: > so something is still not right. Things were going ok, but I'm seeing this > in the logs of several of the replicas > > SEVERE: Unable to create core: dsc-shard3-core1 > org.apache.solr.common.SolrException: Error opening new searcher > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:822) > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:618) > at > org.apache.solr.core.CoreContainer.createFromZk(CoreContainer.java:967) > at > org.apache.solr.core.CoreContainer.create(CoreContainer.java:1049) > at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634) > at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) > at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) > at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Caused by: org.apache.solr.common.SolrException: Error opening new searcher > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1435) > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1547) > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:797) > ... 13 more > Caused by: org.apache.solr.common.SolrException: Error opening Reader > at > org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:172) > at > org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:183) > at > org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:179) > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1411) > ... 15 more > Caused by: java.io.FileNotFoundException: > /cce2/solr/data/dsc-shard3-core1/index/_13x.si (No such file or directory) > at java.io.RandomAccessFile.open(Native Method) > at java.io.RandomAccessFile.<init>(RandomAccessFile.java:216) > at > org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:193) > at > org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232) > at > org.apache.lucene.codecs.lucene40.Lucene40SegmentInfoReader.read(Lucene40SegmentInfoReader.java:50) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:301) > at > org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:56) > at > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783) > at > org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52) > at > org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:88) > at > org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:34) > at > org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:169) > ... 18 more > > > > On Wed, Apr 3, 2013 at 8:54 PM, Jamie Johnson <jej2...@gmail.com> wrote: > >> Thanks I will try that. >> >> >> On Wed, Apr 3, 2013 at 8:28 PM, Mark Miller <markrmil...@gmail.com> wrote: >> >>> >>> >>> On Apr 3, 2013, at 8:17 PM, Jamie Johnson <jej2...@gmail.com> wrote: >>> >>>> I am not using the concurrent low pause garbage collector, I could look >>> at >>>> switching, I'm assuming you're talking about adding >>> -XX:+UseConcMarkSweepGC >>>> correct? >>> >>> Right - if you don't do that, the default is almost always the throughput >>> collector (I've only seen OSX buck this trend when apple handled java). >>> That means stop the world garbage collections, so with larger heaps, that >>> can be a fair amount of time that no threads can run. It's not that great >>> for something as interactive as search generally is anyway, but it's always >>> not that great when added to heavy load and a 15 sec session timeout >>> between solr and zk. >>> >>> >>> The below is odd - a replica node is waiting for the leader to see it as >>> recovering and live - live means it has created an ephemeral node for that >>> Solr corecontainer in zk - it's very strange if that didn't happen, unless >>> this happened during shutdown or something. >>> >>>> >>>> I also just had a shard go down and am seeing this in the log >>>> >>>> SEVERE: org.apache.solr.common.SolrException: I was asked to wait on >>> state >>>> down for 10.38.33.17:7576_solr but I still do not see the requested >>> state. >>>> I see state: recovering live:false >>>> at >>>> >>> org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler.java:890) >>>> at >>>> >>> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:186) >>>> at >>>> >>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) >>>> at >>>> >>> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:591) >>>> at >>>> >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:192) >>>> at >>>> >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) >>>> at >>>> >>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) >>>> at >>>> >>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) >>>> at >>>> >>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) >>>> at >>>> >>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) >>>> at >>>> >>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) >>>> >>>> Nothing other than this in the log jumps out as interesting though. >>>> >>>> >>>> On Wed, Apr 3, 2013 at 7:47 PM, Mark Miller <markrmil...@gmail.com> >>> wrote: >>>> >>>>> This shouldn't be a problem though, if things are working as they are >>>>> supposed to. Another node should simply take over as the overseer and >>>>> continue processing the work queue. It's just best if you configure so >>> that >>>>> session timeouts don't happen unless a node is really down. On the >>> other >>>>> hand, it's nicer to detect that faster. Your tradeoff to make. >>>>> >>>>> - Mark >>>>> >>>>> On Apr 3, 2013, at 7:46 PM, Mark Miller <markrmil...@gmail.com> wrote: >>>>> >>>>>> Yeah. Are you using the concurrent low pause garbage collector? >>>>>> >>>>>> This means the overseer wasn't able to communicate with zk for 15 >>>>> seconds - due to load or gc or whatever. If you can't resolve the root >>>>> cause of that, or the load just won't allow for it, next best thing >>> you can >>>>> do is raise it to 30 seconds. >>>>>> >>>>>> - Mark >>>>>> >>>>>> On Apr 3, 2013, at 7:41 PM, Jamie Johnson <jej2...@gmail.com> wrote: >>>>>> >>>>>>> I am occasionally seeing this in the log, is this just a timeout >>> issue? >>>>>>> Should I be increasing the zk client timeout? >>>>>>> >>>>>>> WARNING: Overseer cannot talk to ZK >>>>>>> Apr 3, 2013 11:14:25 PM >>>>>>> org.apache.solr.cloud.DistributedQueue$LatchChildWatcher process >>>>>>> INFO: Watcher fired on path: null state: Expired type None >>>>>>> Apr 3, 2013 11:14:25 PM >>>>> org.apache.solr.cloud.Overseer$ClusterStateUpdater >>>>>>> run >>>>>>> WARNING: Solr cannot talk to ZK, exiting Overseer main queue loop >>>>>>> org.apache.zookeeper.KeeperException$SessionExpiredException: >>>>>>> KeeperErrorCode = Session expired for /overseer/queue >>>>>>> at >>>>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:127) >>>>>>> at >>>>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:51) >>>>>>> at >>> org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468) >>>>>>> at >>>>>>> >>>>> >>> org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:236) >>>>>>> at >>>>>>> >>>>> >>> org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:233) >>>>>>> at >>>>>>> >>>>> >>> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:65) >>>>>>> at >>>>>>> >>>>> >>> org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:233) >>>>>>> at >>>>>>> >>>>> >>> org.apache.solr.cloud.DistributedQueue.orderedChildren(DistributedQueue.java:89) >>>>>>> at >>>>>>> >>>>> >>> org.apache.solr.cloud.DistributedQueue.element(DistributedQueue.java:131) >>>>>>> at >>>>>>> >>> org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:326) >>>>>>> at >>>>>>> >>>>> >>> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:128) >>>>>>> at java.lang.Thread.run(Thread.java:662) >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Apr 3, 2013 at 7:25 PM, Jamie Johnson <jej2...@gmail.com> >>>>> wrote: >>>>>>> >>>>>>>> just an update, I'm at 1M records now with no issues. This looks >>>>>>>> promising as to the cause of my issues, thanks for the help. Is the >>>>>>>> routing method with numShards documented anywhere? I know >>> numShards is >>>>>>>> documented but I didn't know that the routing changed if you don't >>>>> specify >>>>>>>> it. >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Apr 3, 2013 at 4:44 PM, Jamie Johnson <jej2...@gmail.com> >>>>> wrote: >>>>>>>> >>>>>>>>> with these changes things are looking good, I'm up to 600,000 >>>>> documents >>>>>>>>> without any issues as of right now. I'll keep going and add more >>> to >>>>> see if >>>>>>>>> I find anything. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Apr 3, 2013 at 4:01 PM, Jamie Johnson <jej2...@gmail.com> >>>>> wrote: >>>>>>>>> >>>>>>>>>> ok, so that's not a deal breaker for me. I just changed it to >>> match >>>>> the >>>>>>>>>> shards that are auto created and it looks like things are happy. >>>>> I'll go >>>>>>>>>> ahead and try my test to see if I can get things out of sync. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Apr 3, 2013 at 3:56 PM, Mark Miller < >>> markrmil...@gmail.com >>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> I had thought you could - but looking at the code recently, I >>> don't >>>>>>>>>>> think you can anymore. I think that's a technical limitation more >>>>> than >>>>>>>>>>> anything though. When these changes were made, I think support >>> for >>>>> that was >>>>>>>>>>> simply not added at the time. >>>>>>>>>>> >>>>>>>>>>> I'm not sure exactly how straightforward it would be, but it >>> seems >>>>>>>>>>> doable - as it is, the overseer will preallocate shards when >>> first >>>>> creating >>>>>>>>>>> the collection - that's when they get named shard(n). There would >>>>> have to >>>>>>>>>>> be logic to replace shard(n) with the custom shard name when the >>>>> core >>>>>>>>>>> actually registers. >>>>>>>>>>> >>>>>>>>>>> - Mark >>>>>>>>>>> >>>>>>>>>>> On Apr 3, 2013, at 3:42 PM, Jamie Johnson <jej2...@gmail.com> >>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> answered my own question, it now says compositeId. What is >>>>>>>>>>> problematic >>>>>>>>>>>> though is that in addition to my shards (which are say >>>>> jamie-shard1) >>>>>>>>>>> I see >>>>>>>>>>>> the solr created shards (shard1). I assume that these were >>> created >>>>>>>>>>> because >>>>>>>>>>>> of the numShards param. Is there no way to specify the names of >>>>> these >>>>>>>>>>>> shards? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Apr 3, 2013 at 3:25 PM, Jamie Johnson < >>> jej2...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> ah interesting....so I need to specify num shards, blow out zk >>> and >>>>>>>>>>> then >>>>>>>>>>>>> try this again to see if things work properly now. What is >>> really >>>>>>>>>>> strange >>>>>>>>>>>>> is that for the most part things have worked right and on >>> 4.2.1 I >>>>>>>>>>> have >>>>>>>>>>>>> 600,000 items indexed with no duplicates. In any event I will >>>>>>>>>>> specify num >>>>>>>>>>>>> shards clear out zk and begin again. If this works properly >>> what >>>>>>>>>>> should >>>>>>>>>>>>> the router type be? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Apr 3, 2013 at 3:14 PM, Mark Miller < >>>>> markrmil...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> If you don't specify numShards after 4.1, you get an implicit >>> doc >>>>>>>>>>> router >>>>>>>>>>>>>> and it's up to you to distribute updates. In the past, >>>>> partitioning >>>>>>>>>>> was >>>>>>>>>>>>>> done on the fly - but for shard splitting and perhaps other >>>>>>>>>>> features, we >>>>>>>>>>>>>> now divvy up the hash range up front based on numShards and >>> store >>>>>>>>>>> it in >>>>>>>>>>>>>> ZooKeeper. No numShards is now how you take complete control >>> of >>>>>>>>>>> updates >>>>>>>>>>>>>> yourself. >>>>>>>>>>>>>> >>>>>>>>>>>>>> - Mark >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Apr 3, 2013, at 2:57 PM, Jamie Johnson <jej2...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> The router says "implicit". I did start from a blank zk >>> state >>>>> but >>>>>>>>>>>>>> perhaps >>>>>>>>>>>>>>> I missed one of the ZkCLI commands? One of my shards from >>> the >>>>>>>>>>>>>>> clusterstate.json is shown below. What is the process that >>>>> should >>>>>>>>>>> be >>>>>>>>>>>>>> done >>>>>>>>>>>>>>> to bootstrap a cluster other than the ZkCLI commands I listed >>>>>>>>>>> above? My >>>>>>>>>>>>>>> process right now is run those ZkCLI commands and then start >>>>> solr >>>>>>>>>>> on >>>>>>>>>>>>>> all of >>>>>>>>>>>>>>> the instances with a command like this >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> java -server -Dshard=shard5 -DcoreName=shard5-core1 >>>>>>>>>>>>>>> -Dsolr.data.dir=/solr/data/shard5-core1 >>>>>>>>>>>>>> -Dcollection.configName=solr-conf >>>>>>>>>>>>>>> -Dcollection=collection1 >>>>>>>>>>> -DzkHost=so-zoo1:2181,so-zoo2:2181,so-zoo3:2181 >>>>>>>>>>>>>>> -Djetty.port=7575 -DhostPort=7575 -jar start.jar >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I feel like maybe I'm missing a step. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "shard5":{ >>>>>>>>>>>>>>> "state":"active", >>>>>>>>>>>>>>> "replicas":{ >>>>>>>>>>>>>>> "10.38.33.16:7575_solr_shard5-core1":{ >>>>>>>>>>>>>>> "shard":"shard5", >>>>>>>>>>>>>>> "state":"active", >>>>>>>>>>>>>>> "core":"shard5-core1", >>>>>>>>>>>>>>> "collection":"collection1", >>>>>>>>>>>>>>> "node_name":"10.38.33.16:7575_solr", >>>>>>>>>>>>>>> "base_url":"http://10.38.33.16:7575/solr", >>>>>>>>>>>>>>> "leader":"true"}, >>>>>>>>>>>>>>> "10.38.33.17:7577_solr_shard5-core2":{ >>>>>>>>>>>>>>> "shard":"shard5", >>>>>>>>>>>>>>> "state":"recovering", >>>>>>>>>>>>>>> "core":"shard5-core2", >>>>>>>>>>>>>>> "collection":"collection1", >>>>>>>>>>>>>>> "node_name":"10.38.33.17:7577_solr", >>>>>>>>>>>>>>> "base_url":"http://10.38.33.17:7577/solr"}}} >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 2:40 PM, Mark Miller < >>>>> markrmil...@gmail.com >>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> It should be part of your clusterstate.json. Some users have >>>>>>>>>>> reported >>>>>>>>>>>>>>>> trouble upgrading a previous zk install when this change >>> came. >>>>> I >>>>>>>>>>>>>>>> recommended manually updating the clusterstate.json to have >>> the >>>>>>>>>>> right >>>>>>>>>>>>>> info, >>>>>>>>>>>>>>>> and that seemed to work. Otherwise, I guess you have to >>> start >>>>>>>>>>> from a >>>>>>>>>>>>>> clean >>>>>>>>>>>>>>>> zk state. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If you don't have that range information, I think there >>> will be >>>>>>>>>>>>>> trouble. >>>>>>>>>>>>>>>> Do you have an router type defined in the clusterstate.json? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - Mark >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Apr 3, 2013, at 2:24 PM, Jamie Johnson < >>> jej2...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Where is this information stored in ZK? I don't see it in >>> the >>>>>>>>>>> cluster >>>>>>>>>>>>>>>>> state (or perhaps I don't understand it ;) ). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Perhaps something with my process is broken. What I do >>> when I >>>>>>>>>>> start >>>>>>>>>>>>>> from >>>>>>>>>>>>>>>>> scratch is the following >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ZkCLI -cmd upconfig ... >>>>>>>>>>>>>>>>> ZkCLI -cmd linkconfig .... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> but I don't ever explicitly create the collection. What >>>>> should >>>>>>>>>>> the >>>>>>>>>>>>>> steps >>>>>>>>>>>>>>>>> from scratch be? I am moving from an unreleased snapshot >>> of >>>>> 4.0 >>>>>>>>>>> so I >>>>>>>>>>>>>>>> never >>>>>>>>>>>>>>>>> did that previously either so perhaps I did create the >>>>>>>>>>> collection in >>>>>>>>>>>>>> one >>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>> my steps to get this working but have forgotten it along >>> the >>>>> way. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 2:16 PM, Mark Miller < >>>>>>>>>>> markrmil...@gmail.com> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks for digging Jamie. In 4.2, hash ranges are >>> assigned up >>>>>>>>>>> front >>>>>>>>>>>>>>>> when a >>>>>>>>>>>>>>>>>> collection is created - each shard gets a range, which is >>>>>>>>>>> stored in >>>>>>>>>>>>>>>>>> zookeeper. You should not be able to end up with the same >>> id >>>>> on >>>>>>>>>>>>>>>> different >>>>>>>>>>>>>>>>>> shards - something very odd going on. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hopefully I'll have some time to try and help you >>> reproduce. >>>>>>>>>>> Ideally >>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>> can capture it in a test case. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> - Mark >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Apr 3, 2013, at 1:13 PM, Jamie Johnson < >>> jej2...@gmail.com >>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> no, my thought was wrong, it appears that even with the >>>>>>>>>>> parameter >>>>>>>>>>>>>> set I >>>>>>>>>>>>>>>>>> am >>>>>>>>>>>>>>>>>>> seeing this behavior. I've been able to duplicate it on >>>>> 4.2.0 >>>>>>>>>>> by >>>>>>>>>>>>>>>>>> indexing >>>>>>>>>>>>>>>>>>> 100,000 documents on 10 threads (10,000 each) when I get >>> to >>>>>>>>>>> 400,000 >>>>>>>>>>>>>> or >>>>>>>>>>>>>>>>>> so. >>>>>>>>>>>>>>>>>>> I will try this on 4.2.1. to see if I see the same >>> behavior >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 12:37 PM, Jamie Johnson < >>>>>>>>>>> jej2...@gmail.com> >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Since I don't have that many items in my index I >>> exported >>>>> all >>>>>>>>>>> of >>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> keys >>>>>>>>>>>>>>>>>>>> for each shard and wrote a simple java program that >>> checks >>>>> for >>>>>>>>>>>>>>>>>> duplicates. >>>>>>>>>>>>>>>>>>>> I found some duplicate keys on different shards, a grep >>> of >>>>> the >>>>>>>>>>>>>> files >>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>> the keys found does indicate that they made it to the >>> wrong >>>>>>>>>>> places. >>>>>>>>>>>>>>>> If >>>>>>>>>>>>>>>>>> you >>>>>>>>>>>>>>>>>>>> notice documents with the same ID are on shard 3 and >>> shard >>>>> 5. >>>>>>>>>>> Is >>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>>>> possible that the hash is being calculated taking into >>>>>>>>>>> account only >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>> "live" nodes? I know that we don't specify the >>> numShards >>>>>>>>>>> param @ >>>>>>>>>>>>>>>>>> startup >>>>>>>>>>>>>>>>>>>> so could this be what is happening? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> grep -c "7cd1a717-3d94-4f5d-bcb1-9d8a95ca78de" * >>>>>>>>>>>>>>>>>>>> shard1-core1:0 >>>>>>>>>>>>>>>>>>>> shard1-core2:0 >>>>>>>>>>>>>>>>>>>> shard2-core1:0 >>>>>>>>>>>>>>>>>>>> shard2-core2:0 >>>>>>>>>>>>>>>>>>>> shard3-core1:1 >>>>>>>>>>>>>>>>>>>> shard3-core2:1 >>>>>>>>>>>>>>>>>>>> shard4-core1:0 >>>>>>>>>>>>>>>>>>>> shard4-core2:0 >>>>>>>>>>>>>>>>>>>> shard5-core1:1 >>>>>>>>>>>>>>>>>>>> shard5-core2:1 >>>>>>>>>>>>>>>>>>>> shard6-core1:0 >>>>>>>>>>>>>>>>>>>> shard6-core2:0 >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 10:42 AM, Jamie Johnson < >>>>>>>>>>> jej2...@gmail.com> >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Something interesting that I'm noticing as well, I just >>>>>>>>>>> indexed >>>>>>>>>>>>>>>> 300,000 >>>>>>>>>>>>>>>>>>>>> items, and some how 300,020 ended up in the index. I >>>>> thought >>>>>>>>>>>>>>>> perhaps I >>>>>>>>>>>>>>>>>>>>> messed something up so I started the indexing again and >>>>>>>>>>> indexed >>>>>>>>>>>>>>>> another >>>>>>>>>>>>>>>>>>>>> 400,000 and I see 400,064 docs. Is there a good way to >>>>> find >>>>>>>>>>>>>>>> possibile >>>>>>>>>>>>>>>>>>>>> duplicates? I had tried to facet on key (our id field) >>>>> but >>>>>>>>>>> that >>>>>>>>>>>>>>>> didn't >>>>>>>>>>>>>>>>>>>>> give me anything with more than a count of 1. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 9:22 AM, Jamie Johnson < >>>>>>>>>>> jej2...@gmail.com> >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Ok, so clearing the transaction log allowed things to >>> go >>>>>>>>>>> again. >>>>>>>>>>>>>> I >>>>>>>>>>>>>>>> am >>>>>>>>>>>>>>>>>>>>>> going to clear the index and try to replicate the >>>>> problem on >>>>>>>>>>>>>> 4.2.0 >>>>>>>>>>>>>>>>>> and then >>>>>>>>>>>>>>>>>>>>>> I'll try on 4.2.1 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 8:21 AM, Mark Miller < >>>>>>>>>>>>>> markrmil...@gmail.com >>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> No, not that I know if, which is why I say we need to >>>>> get >>>>>>>>>>> to the >>>>>>>>>>>>>>>>>> bottom >>>>>>>>>>>>>>>>>>>>>>> of it. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> - Mark >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 10:18 PM, Jamie Johnson < >>>>>>>>>>> jej2...@gmail.com> >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Mark >>>>>>>>>>>>>>>>>>>>>>>> It's there a particular jira issue that you think >>> may >>>>>>>>>>> address >>>>>>>>>>>>>>>> this? >>>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>>>>>>> read >>>>>>>>>>>>>>>>>>>>>>>> through it quickly but didn't see one that jumped >>> out >>>>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013 10:07 PM, "Jamie Johnson" < >>>>>>>>>>> jej2...@gmail.com> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I brought the bad one down and back up and it did >>>>>>>>>>> nothing. I >>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>>>>>>>> clear >>>>>>>>>>>>>>>>>>>>>>>>> the index and try4.2.1. I will save off the logs >>> and >>>>> see >>>>>>>>>>> if >>>>>>>>>>>>>> there >>>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>>>>>>>> anything else odd >>>>>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013 9:13 PM, "Mark Miller" < >>>>>>>>>>> markrmil...@gmail.com> >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> It would appear it's a bug given what you have >>> said. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Any other exceptions would be useful. Might be >>> best >>>>> to >>>>>>>>>>> start >>>>>>>>>>>>>>>>>>>>>>> tracking in >>>>>>>>>>>>>>>>>>>>>>>>>> a JIRA issue as well. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> To fix, I'd bring the behind node down and back >>>>> again. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Unfortunately, I'm pressed for time, but we really >>>>> need >>>>>>>>>>> to >>>>>>>>>>>>>> get >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>> bottom of this and fix it, or determine if it's >>>>> fixed in >>>>>>>>>>>>>> 4.2.1 >>>>>>>>>>>>>>>>>>>>>>> (spreading >>>>>>>>>>>>>>>>>>>>>>>>>> to mirrors now). >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> - Mark >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 7:21 PM, Jamie Johnson < >>>>>>>>>>> jej2...@gmail.com >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry I didn't ask the obvious question. Is >>> there >>>>>>>>>>> anything >>>>>>>>>>>>>>>> else >>>>>>>>>>>>>>>>>>>>>>> that I >>>>>>>>>>>>>>>>>>>>>>>>>>> should be looking for here and is this a bug? >>> I'd >>>>> be >>>>>>>>>>> happy >>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>> troll >>>>>>>>>>>>>>>>>>>>>>>>>>> through the logs further if more information is >>>>>>>>>>> needed, just >>>>>>>>>>>>>>>> let >>>>>>>>>>>>>>>>>> me >>>>>>>>>>>>>>>>>>>>>>>>>> know. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Also what is the most appropriate mechanism to >>> fix >>>>>>>>>>> this. >>>>>>>>>>>>>> Is it >>>>>>>>>>>>>>>>>>>>>>>>>> required to >>>>>>>>>>>>>>>>>>>>>>>>>>> kill the index that is out of sync and let solr >>>>> resync >>>>>>>>>>>>>> things? >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson < >>>>>>>>>>>>>>>> jej2...@gmail.com >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> sorry for spamming here.... >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> shard5-core2 is the instance we're having issues >>>>>>>>>>> with... >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM >>>>>>>>>>> org.apache.solr.common.SolrException >>>>>>>>>>>>>>>> log >>>>>>>>>>>>>>>>>>>>>>>>>>>> SEVERE: shard update error StdNode: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>> >>> http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException >>>>>>>>>>>>>>>>>>>>>>>>>> : >>>>>>>>>>>>>>>>>>>>>>>>>>>> Server at >>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr/dsc-shard5-core2returned >>>>>>>>>>>>>>>>>>>>>>> non >>>>>>>>>>>>>>>>>>>>>>>>>> ok >>>>>>>>>>>>>>>>>>>>>>>>>>>> status:503, message:Service Unavailable >>>>>>>>>>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>> >>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373) >>>>>>>>>>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>> >>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) >>>>>>>>>>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>> >>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332) >>>>>>>>>>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>> >>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306) >>>>>>>>>>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >>>>>>>>>>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> java.util.concurrent.FutureTask.run(FutureTask.java:138) >>>>>>>>>>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) >>>>>>>>>>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >>>>>>>>>>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> java.util.concurrent.FutureTask.run(FutureTask.java:138) >>>>>>>>>>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >>>>>>>>>>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >>>>>>>>>>>>>>>>>>>>>>>>>>>> at java.lang.Thread.run(Thread.java:662) >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson < >>>>>>>>>>>>>>>>>> jej2...@gmail.com> >>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> here is another one that looks interesting >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM >>>>>>>>>>>>>> org.apache.solr.common.SolrException >>>>>>>>>>>>>>>> log >>>>>>>>>>>>>>>>>>>>>>>>>>>>> SEVERE: org.apache.solr.common.SolrException: >>>>>>>>>>> ClusterState >>>>>>>>>>>>>>>> says >>>>>>>>>>>>>>>>>>>>>>> we are >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the leader, but locally we don't think so >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>> >>> org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293) >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>> >>> org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228) >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>> >>> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339) >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>> >>> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>> >>> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246) >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>> >>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>> >>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>> >>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>>>>> >>>>> org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>> >>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>> >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson < >>>>>>>>>>>>>>>>>> jej2...@gmail.com >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looking at the master it looks like at some >>> point >>>>>>>>>>> there >>>>>>>>>>>>>> were >>>>>>>>>>>>>>>>>>>>>>> shards >>>>>>>>>>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> went down. I am seeing things like what is >>>>> below. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> NFO: A cluster state change: WatchedEvent >>>>>>>>>>>>>>>> state:SyncConnected >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> type:NodeChildrenChanged path:/live_nodes, has >>>>>>>>>>> occurred - >>>>>>>>>>>>>>>>>>>>>>>>>> updating... (live >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> nodes size: 12) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM >>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.common.cloud.ZkStateReader$3 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> process >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Updating live nodes... (9) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM >>>>>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runLeaderProcess >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Running the leader process. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM >>>>>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> shouldIBeLeader >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Checking if I should try and be the >>> leader. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM >>>>>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> shouldIBeLeader >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: My last published State was Active, it's >>>>> okay >>>>>>>>>>> to be >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>> leader. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM >>>>>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runLeaderProcess >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: I may be the new leader - try and sync >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller < >>>>>>>>>>>>>>>>>>>>>>> markrmil...@gmail.com >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't think the versions you are thinking >>> of >>>>>>>>>>> apply >>>>>>>>>>>>>> here. >>>>>>>>>>>>>>>>>>>>>>> Peersync >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does not look at that - it looks at version >>>>>>>>>>> numbers for >>>>>>>>>>>>>>>>>>>>>>> updates in >>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> transaction log - it compares the last 100 of >>>>> them >>>>>>>>>>> on >>>>>>>>>>>>>>>> leader >>>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>> replica. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What it's saying is that the replica seems to >>>>> have >>>>>>>>>>>>>> versions >>>>>>>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>>>>>> the leader >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does not. Have you scanned the logs for any >>>>>>>>>>> interesting >>>>>>>>>>>>>>>>>>>>>>> exceptions? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Did the leader change during the heavy >>> indexing? >>>>>>>>>>> Did >>>>>>>>>>>>>> any zk >>>>>>>>>>>>>>>>>>>>>>> session >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> timeouts occur? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Mark >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 4:52 PM, Jamie Johnson < >>>>>>>>>>>>>>>> jej2...@gmail.com >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am currently looking at moving our Solr >>>>> cluster >>>>>>>>>>> to >>>>>>>>>>>>>> 4.2 >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>> noticed a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> strange issue while testing today. >>>>> Specifically >>>>>>>>>>> the >>>>>>>>>>>>>>>> replica >>>>>>>>>>>>>>>>>>>>>>> has a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> higher >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> version than the master which is causing the >>>>>>>>>>> index to >>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>>>>>>>>>> replicate. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Because of this the replica has fewer >>> documents >>>>>>>>>>> than >>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>> master. >>>>>>>>>>>>>>>>>>>>>>>>>> What >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> could cause this and how can I resolve it >>>>> short of >>>>>>>>>>>>>> taking >>>>>>>>>>>>>>>>>>>>>>> down the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> index >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and scping the right version in? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MASTER: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Last Modified:about an hour ago >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Num Docs:164880 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Max Doc:164880 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Deleted Docs:0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Version:2387 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Segment Count:23 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> REPLICA: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Last Modified: about an hour ago >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Num Docs:164773 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Max Doc:164773 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Deleted Docs:0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Version:3001 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Segment Count:30 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the replicas log it says this: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Creating new http client, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>> >>> config:maxConnectionsPerHost=20&maxConnections=10000&connTimeout=30000&socketTimeout=30000&retry=false >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM >>>>>>>>>>> org.apache.solr.update.PeerSync >>>>>>>>>>>>>>>> sync >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> url= >>> http://10.38.33.17:7577/solrSTARTreplicas=[ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>> http://10.38.33.16:7575/solr/dsc-shard5-core1/ >>>>> ] >>>>>>>>>>>>>>>>>> nUpdates=100 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM >>>>>>>>>>> org.apache.solr.update.PeerSync >>>>>>>>>>>>>>>>>>>>>>>>>> handleVersions >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url= >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Received 100 versions from >>>>>>>>>>>>>>>>>>>>>>> 10.38.33.16:7575/solr/dsc-shard5-core1/ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM >>>>>>>>>>> org.apache.solr.update.PeerSync >>>>>>>>>>>>>>>>>>>>>>>>>> handleVersions >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url= >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr Our >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> versions are newer. >>>>>>>>>>> ourLowThreshold=1431233788792274944 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> otherHigh=1431233789440294912 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM >>>>>>>>>>> org.apache.solr.update.PeerSync >>>>>>>>>>>>>>>> sync >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> url=http://10.38.33.17:7577/solrDONE. sync >>>>>>>>>>> succeeded >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which again seems to point that it thinks it >>>>> has a >>>>>>>>>>>>>> newer >>>>>>>>>>>>>>>>>>>>>>> version of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> index so it aborts. This happened while >>>>> having 10 >>>>>>>>>>>>>> threads >>>>>>>>>>>>>>>>>>>>>>> indexing >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 10,000 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> items writing to a 6 shard (1 replica each) >>>>>>>>>>> cluster. >>>>>>>>>>>>>> Any >>>>>>>>>>>>>>>>>>>>>>> thoughts >>>>>>>>>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> or what I should look for would be >>> appreciated. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>>> >>>>> >>> >>> >>