so something is still not right. Things were going ok, but I'm seeing this in the logs of several of the replicas
SEVERE: Unable to create core: dsc-shard3-core1 org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.<init>(SolrCore.java:822) at org.apache.solr.core.SolrCore.<init>(SolrCore.java:618) at org.apache.solr.core.CoreContainer.createFromZk(CoreContainer.java:967) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1049) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1435) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1547) at org.apache.solr.core.SolrCore.<init>(SolrCore.java:797) ... 13 more Caused by: org.apache.solr.common.SolrException: Error opening Reader at org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:172) at org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:183) at org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:179) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1411) ... 15 more Caused by: java.io.FileNotFoundException: /cce2/solr/data/dsc-shard3-core1/index/_13x.si (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.<init>(RandomAccessFile.java:216) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:193) at org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232) at org.apache.lucene.codecs.lucene40.Lucene40SegmentInfoReader.read(Lucene40SegmentInfoReader.java:50) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:301) at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:56) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:88) at org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:34) at org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:169) ... 18 more On Wed, Apr 3, 2013 at 8:54 PM, Jamie Johnson <jej2...@gmail.com> wrote: > Thanks I will try that. > > > On Wed, Apr 3, 2013 at 8:28 PM, Mark Miller <markrmil...@gmail.com> wrote: > >> >> >> On Apr 3, 2013, at 8:17 PM, Jamie Johnson <jej2...@gmail.com> wrote: >> >> > I am not using the concurrent low pause garbage collector, I could look >> at >> > switching, I'm assuming you're talking about adding >> -XX:+UseConcMarkSweepGC >> > correct? >> >> Right - if you don't do that, the default is almost always the throughput >> collector (I've only seen OSX buck this trend when apple handled java). >> That means stop the world garbage collections, so with larger heaps, that >> can be a fair amount of time that no threads can run. It's not that great >> for something as interactive as search generally is anyway, but it's always >> not that great when added to heavy load and a 15 sec session timeout >> between solr and zk. >> >> >> The below is odd - a replica node is waiting for the leader to see it as >> recovering and live - live means it has created an ephemeral node for that >> Solr corecontainer in zk - it's very strange if that didn't happen, unless >> this happened during shutdown or something. >> >> > >> > I also just had a shard go down and am seeing this in the log >> > >> > SEVERE: org.apache.solr.common.SolrException: I was asked to wait on >> state >> > down for 10.38.33.17:7576_solr but I still do not see the requested >> state. >> > I see state: recovering live:false >> > at >> > >> org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler.java:890) >> > at >> > >> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:186) >> > at >> > >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) >> > at >> > >> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:591) >> > at >> > >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:192) >> > at >> > >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) >> > at >> > >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) >> > at >> > >> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) >> > at >> > >> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) >> > at >> > >> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) >> > at >> > >> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) >> > >> > Nothing other than this in the log jumps out as interesting though. >> > >> > >> > On Wed, Apr 3, 2013 at 7:47 PM, Mark Miller <markrmil...@gmail.com> >> wrote: >> > >> >> This shouldn't be a problem though, if things are working as they are >> >> supposed to. Another node should simply take over as the overseer and >> >> continue processing the work queue. It's just best if you configure so >> that >> >> session timeouts don't happen unless a node is really down. On the >> other >> >> hand, it's nicer to detect that faster. Your tradeoff to make. >> >> >> >> - Mark >> >> >> >> On Apr 3, 2013, at 7:46 PM, Mark Miller <markrmil...@gmail.com> wrote: >> >> >> >>> Yeah. Are you using the concurrent low pause garbage collector? >> >>> >> >>> This means the overseer wasn't able to communicate with zk for 15 >> >> seconds - due to load or gc or whatever. If you can't resolve the root >> >> cause of that, or the load just won't allow for it, next best thing >> you can >> >> do is raise it to 30 seconds. >> >>> >> >>> - Mark >> >>> >> >>> On Apr 3, 2013, at 7:41 PM, Jamie Johnson <jej2...@gmail.com> wrote: >> >>> >> >>>> I am occasionally seeing this in the log, is this just a timeout >> issue? >> >>>> Should I be increasing the zk client timeout? >> >>>> >> >>>> WARNING: Overseer cannot talk to ZK >> >>>> Apr 3, 2013 11:14:25 PM >> >>>> org.apache.solr.cloud.DistributedQueue$LatchChildWatcher process >> >>>> INFO: Watcher fired on path: null state: Expired type None >> >>>> Apr 3, 2013 11:14:25 PM >> >> org.apache.solr.cloud.Overseer$ClusterStateUpdater >> >>>> run >> >>>> WARNING: Solr cannot talk to ZK, exiting Overseer main queue loop >> >>>> org.apache.zookeeper.KeeperException$SessionExpiredException: >> >>>> KeeperErrorCode = Session expired for /overseer/queue >> >>>> at >> >>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:127) >> >>>> at >> >>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:51) >> >>>> at >> org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468) >> >>>> at >> >>>> >> >> >> org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:236) >> >>>> at >> >>>> >> >> >> org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:233) >> >>>> at >> >>>> >> >> >> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:65) >> >>>> at >> >>>> >> >> >> org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:233) >> >>>> at >> >>>> >> >> >> org.apache.solr.cloud.DistributedQueue.orderedChildren(DistributedQueue.java:89) >> >>>> at >> >>>> >> >> >> org.apache.solr.cloud.DistributedQueue.element(DistributedQueue.java:131) >> >>>> at >> >>>> >> org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:326) >> >>>> at >> >>>> >> >> >> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:128) >> >>>> at java.lang.Thread.run(Thread.java:662) >> >>>> >> >>>> >> >>>> >> >>>> On Wed, Apr 3, 2013 at 7:25 PM, Jamie Johnson <jej2...@gmail.com> >> >> wrote: >> >>>> >> >>>>> just an update, I'm at 1M records now with no issues. This looks >> >>>>> promising as to the cause of my issues, thanks for the help. Is the >> >>>>> routing method with numShards documented anywhere? I know >> numShards is >> >>>>> documented but I didn't know that the routing changed if you don't >> >> specify >> >>>>> it. >> >>>>> >> >>>>> >> >>>>> On Wed, Apr 3, 2013 at 4:44 PM, Jamie Johnson <jej2...@gmail.com> >> >> wrote: >> >>>>> >> >>>>>> with these changes things are looking good, I'm up to 600,000 >> >> documents >> >>>>>> without any issues as of right now. I'll keep going and add more >> to >> >> see if >> >>>>>> I find anything. >> >>>>>> >> >>>>>> >> >>>>>> On Wed, Apr 3, 2013 at 4:01 PM, Jamie Johnson <jej2...@gmail.com> >> >> wrote: >> >>>>>> >> >>>>>>> ok, so that's not a deal breaker for me. I just changed it to >> match >> >> the >> >>>>>>> shards that are auto created and it looks like things are happy. >> >> I'll go >> >>>>>>> ahead and try my test to see if I can get things out of sync. >> >>>>>>> >> >>>>>>> >> >>>>>>> On Wed, Apr 3, 2013 at 3:56 PM, Mark Miller < >> markrmil...@gmail.com >> >>> wrote: >> >>>>>>> >> >>>>>>>> I had thought you could - but looking at the code recently, I >> don't >> >>>>>>>> think you can anymore. I think that's a technical limitation more >> >> than >> >>>>>>>> anything though. When these changes were made, I think support >> for >> >> that was >> >>>>>>>> simply not added at the time. >> >>>>>>>> >> >>>>>>>> I'm not sure exactly how straightforward it would be, but it >> seems >> >>>>>>>> doable - as it is, the overseer will preallocate shards when >> first >> >> creating >> >>>>>>>> the collection - that's when they get named shard(n). There would >> >> have to >> >>>>>>>> be logic to replace shard(n) with the custom shard name when the >> >> core >> >>>>>>>> actually registers. >> >>>>>>>> >> >>>>>>>> - Mark >> >>>>>>>> >> >>>>>>>> On Apr 3, 2013, at 3:42 PM, Jamie Johnson <jej2...@gmail.com> >> >> wrote: >> >>>>>>>> >> >>>>>>>>> answered my own question, it now says compositeId. What is >> >>>>>>>> problematic >> >>>>>>>>> though is that in addition to my shards (which are say >> >> jamie-shard1) >> >>>>>>>> I see >> >>>>>>>>> the solr created shards (shard1). I assume that these were >> created >> >>>>>>>> because >> >>>>>>>>> of the numShards param. Is there no way to specify the names of >> >> these >> >>>>>>>>> shards? >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> On Wed, Apr 3, 2013 at 3:25 PM, Jamie Johnson < >> jej2...@gmail.com> >> >>>>>>>> wrote: >> >>>>>>>>> >> >>>>>>>>>> ah interesting....so I need to specify num shards, blow out zk >> and >> >>>>>>>> then >> >>>>>>>>>> try this again to see if things work properly now. What is >> really >> >>>>>>>> strange >> >>>>>>>>>> is that for the most part things have worked right and on >> 4.2.1 I >> >>>>>>>> have >> >>>>>>>>>> 600,000 items indexed with no duplicates. In any event I will >> >>>>>>>> specify num >> >>>>>>>>>> shards clear out zk and begin again. If this works properly >> what >> >>>>>>>> should >> >>>>>>>>>> the router type be? >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> On Wed, Apr 3, 2013 at 3:14 PM, Mark Miller < >> >> markrmil...@gmail.com> >> >>>>>>>> wrote: >> >>>>>>>>>> >> >>>>>>>>>>> If you don't specify numShards after 4.1, you get an implicit >> doc >> >>>>>>>> router >> >>>>>>>>>>> and it's up to you to distribute updates. In the past, >> >> partitioning >> >>>>>>>> was >> >>>>>>>>>>> done on the fly - but for shard splitting and perhaps other >> >>>>>>>> features, we >> >>>>>>>>>>> now divvy up the hash range up front based on numShards and >> store >> >>>>>>>> it in >> >>>>>>>>>>> ZooKeeper. No numShards is now how you take complete control >> of >> >>>>>>>> updates >> >>>>>>>>>>> yourself. >> >>>>>>>>>>> >> >>>>>>>>>>> - Mark >> >>>>>>>>>>> >> >>>>>>>>>>> On Apr 3, 2013, at 2:57 PM, Jamie Johnson <jej2...@gmail.com> >> >>>>>>>> wrote: >> >>>>>>>>>>> >> >>>>>>>>>>>> The router says "implicit". I did start from a blank zk >> state >> >> but >> >>>>>>>>>>> perhaps >> >>>>>>>>>>>> I missed one of the ZkCLI commands? One of my shards from >> the >> >>>>>>>>>>>> clusterstate.json is shown below. What is the process that >> >> should >> >>>>>>>> be >> >>>>>>>>>>> done >> >>>>>>>>>>>> to bootstrap a cluster other than the ZkCLI commands I listed >> >>>>>>>> above? My >> >>>>>>>>>>>> process right now is run those ZkCLI commands and then start >> >> solr >> >>>>>>>> on >> >>>>>>>>>>> all of >> >>>>>>>>>>>> the instances with a command like this >> >>>>>>>>>>>> >> >>>>>>>>>>>> java -server -Dshard=shard5 -DcoreName=shard5-core1 >> >>>>>>>>>>>> -Dsolr.data.dir=/solr/data/shard5-core1 >> >>>>>>>>>>> -Dcollection.configName=solr-conf >> >>>>>>>>>>>> -Dcollection=collection1 >> >>>>>>>> -DzkHost=so-zoo1:2181,so-zoo2:2181,so-zoo3:2181 >> >>>>>>>>>>>> -Djetty.port=7575 -DhostPort=7575 -jar start.jar >> >>>>>>>>>>>> >> >>>>>>>>>>>> I feel like maybe I'm missing a step. >> >>>>>>>>>>>> >> >>>>>>>>>>>> "shard5":{ >> >>>>>>>>>>>> "state":"active", >> >>>>>>>>>>>> "replicas":{ >> >>>>>>>>>>>> "10.38.33.16:7575_solr_shard5-core1":{ >> >>>>>>>>>>>> "shard":"shard5", >> >>>>>>>>>>>> "state":"active", >> >>>>>>>>>>>> "core":"shard5-core1", >> >>>>>>>>>>>> "collection":"collection1", >> >>>>>>>>>>>> "node_name":"10.38.33.16:7575_solr", >> >>>>>>>>>>>> "base_url":"http://10.38.33.16:7575/solr", >> >>>>>>>>>>>> "leader":"true"}, >> >>>>>>>>>>>> "10.38.33.17:7577_solr_shard5-core2":{ >> >>>>>>>>>>>> "shard":"shard5", >> >>>>>>>>>>>> "state":"recovering", >> >>>>>>>>>>>> "core":"shard5-core2", >> >>>>>>>>>>>> "collection":"collection1", >> >>>>>>>>>>>> "node_name":"10.38.33.17:7577_solr", >> >>>>>>>>>>>> "base_url":"http://10.38.33.17:7577/solr"}}} >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> On Wed, Apr 3, 2013 at 2:40 PM, Mark Miller < >> >> markrmil...@gmail.com >> >>>>>>>>> >> >>>>>>>>>>> wrote: >> >>>>>>>>>>>> >> >>>>>>>>>>>>> It should be part of your clusterstate.json. Some users have >> >>>>>>>> reported >> >>>>>>>>>>>>> trouble upgrading a previous zk install when this change >> came. >> >> I >> >>>>>>>>>>>>> recommended manually updating the clusterstate.json to have >> the >> >>>>>>>> right >> >>>>>>>>>>> info, >> >>>>>>>>>>>>> and that seemed to work. Otherwise, I guess you have to >> start >> >>>>>>>> from a >> >>>>>>>>>>> clean >> >>>>>>>>>>>>> zk state. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> If you don't have that range information, I think there >> will be >> >>>>>>>>>>> trouble. >> >>>>>>>>>>>>> Do you have an router type defined in the clusterstate.json? >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> - Mark >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> On Apr 3, 2013, at 2:24 PM, Jamie Johnson < >> jej2...@gmail.com> >> >>>>>>>> wrote: >> >>>>>>>>>>>>> >> >>>>>>>>>>>>>> Where is this information stored in ZK? I don't see it in >> the >> >>>>>>>> cluster >> >>>>>>>>>>>>>> state (or perhaps I don't understand it ;) ). >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Perhaps something with my process is broken. What I do >> when I >> >>>>>>>> start >> >>>>>>>>>>> from >> >>>>>>>>>>>>>> scratch is the following >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> ZkCLI -cmd upconfig ... >> >>>>>>>>>>>>>> ZkCLI -cmd linkconfig .... >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> but I don't ever explicitly create the collection. What >> >> should >> >>>>>>>> the >> >>>>>>>>>>> steps >> >>>>>>>>>>>>>> from scratch be? I am moving from an unreleased snapshot >> of >> >> 4.0 >> >>>>>>>> so I >> >>>>>>>>>>>>> never >> >>>>>>>>>>>>>> did that previously either so perhaps I did create the >> >>>>>>>> collection in >> >>>>>>>>>>> one >> >>>>>>>>>>>>> of >> >>>>>>>>>>>>>> my steps to get this working but have forgotten it along >> the >> >> way. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 2:16 PM, Mark Miller < >> >>>>>>>> markrmil...@gmail.com> >> >>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> Thanks for digging Jamie. In 4.2, hash ranges are >> assigned up >> >>>>>>>> front >> >>>>>>>>>>>>> when a >> >>>>>>>>>>>>>>> collection is created - each shard gets a range, which is >> >>>>>>>> stored in >> >>>>>>>>>>>>>>> zookeeper. You should not be able to end up with the same >> id >> >> on >> >>>>>>>>>>>>> different >> >>>>>>>>>>>>>>> shards - something very odd going on. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> Hopefully I'll have some time to try and help you >> reproduce. >> >>>>>>>> Ideally >> >>>>>>>>>>> we >> >>>>>>>>>>>>>>> can capture it in a test case. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> - Mark >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> On Apr 3, 2013, at 1:13 PM, Jamie Johnson < >> jej2...@gmail.com >> >>> >> >>>>>>>> wrote: >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> no, my thought was wrong, it appears that even with the >> >>>>>>>> parameter >> >>>>>>>>>>> set I >> >>>>>>>>>>>>>>> am >> >>>>>>>>>>>>>>>> seeing this behavior. I've been able to duplicate it on >> >> 4.2.0 >> >>>>>>>> by >> >>>>>>>>>>>>>>> indexing >> >>>>>>>>>>>>>>>> 100,000 documents on 10 threads (10,000 each) when I get >> to >> >>>>>>>> 400,000 >> >>>>>>>>>>> or >> >>>>>>>>>>>>>>> so. >> >>>>>>>>>>>>>>>> I will try this on 4.2.1. to see if I see the same >> behavior >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 12:37 PM, Jamie Johnson < >> >>>>>>>> jej2...@gmail.com> >> >>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> Since I don't have that many items in my index I >> exported >> >> all >> >>>>>>>> of >> >>>>>>>>>>> the >> >>>>>>>>>>>>>>> keys >> >>>>>>>>>>>>>>>>> for each shard and wrote a simple java program that >> checks >> >> for >> >>>>>>>>>>>>>>> duplicates. >> >>>>>>>>>>>>>>>>> I found some duplicate keys on different shards, a grep >> of >> >> the >> >>>>>>>>>>> files >> >>>>>>>>>>>>> for >> >>>>>>>>>>>>>>>>> the keys found does indicate that they made it to the >> wrong >> >>>>>>>> places. >> >>>>>>>>>>>>> If >> >>>>>>>>>>>>>>> you >> >>>>>>>>>>>>>>>>> notice documents with the same ID are on shard 3 and >> shard >> >> 5. >> >>>>>>>> Is >> >>>>>>>>>>> it >> >>>>>>>>>>>>>>>>> possible that the hash is being calculated taking into >> >>>>>>>> account only >> >>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>> "live" nodes? I know that we don't specify the >> numShards >> >>>>>>>> param @ >> >>>>>>>>>>>>>>> startup >> >>>>>>>>>>>>>>>>> so could this be what is happening? >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> grep -c "7cd1a717-3d94-4f5d-bcb1-9d8a95ca78de" * >> >>>>>>>>>>>>>>>>> shard1-core1:0 >> >>>>>>>>>>>>>>>>> shard1-core2:0 >> >>>>>>>>>>>>>>>>> shard2-core1:0 >> >>>>>>>>>>>>>>>>> shard2-core2:0 >> >>>>>>>>>>>>>>>>> shard3-core1:1 >> >>>>>>>>>>>>>>>>> shard3-core2:1 >> >>>>>>>>>>>>>>>>> shard4-core1:0 >> >>>>>>>>>>>>>>>>> shard4-core2:0 >> >>>>>>>>>>>>>>>>> shard5-core1:1 >> >>>>>>>>>>>>>>>>> shard5-core2:1 >> >>>>>>>>>>>>>>>>> shard6-core1:0 >> >>>>>>>>>>>>>>>>> shard6-core2:0 >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 10:42 AM, Jamie Johnson < >> >>>>>>>> jej2...@gmail.com> >> >>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> Something interesting that I'm noticing as well, I just >> >>>>>>>> indexed >> >>>>>>>>>>>>> 300,000 >> >>>>>>>>>>>>>>>>>> items, and some how 300,020 ended up in the index. I >> >> thought >> >>>>>>>>>>>>> perhaps I >> >>>>>>>>>>>>>>>>>> messed something up so I started the indexing again and >> >>>>>>>> indexed >> >>>>>>>>>>>>> another >> >>>>>>>>>>>>>>>>>> 400,000 and I see 400,064 docs. Is there a good way to >> >> find >> >>>>>>>>>>>>> possibile >> >>>>>>>>>>>>>>>>>> duplicates? I had tried to facet on key (our id field) >> >> but >> >>>>>>>> that >> >>>>>>>>>>>>> didn't >> >>>>>>>>>>>>>>>>>> give me anything with more than a count of 1. >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 9:22 AM, Jamie Johnson < >> >>>>>>>> jej2...@gmail.com> >> >>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> Ok, so clearing the transaction log allowed things to >> go >> >>>>>>>> again. >> >>>>>>>>>>> I >> >>>>>>>>>>>>> am >> >>>>>>>>>>>>>>>>>>> going to clear the index and try to replicate the >> >> problem on >> >>>>>>>>>>> 4.2.0 >> >>>>>>>>>>>>>>> and then >> >>>>>>>>>>>>>>>>>>> I'll try on 4.2.1 >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 8:21 AM, Mark Miller < >> >>>>>>>>>>> markrmil...@gmail.com >> >>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> No, not that I know if, which is why I say we need to >> >> get >> >>>>>>>> to the >> >>>>>>>>>>>>>>> bottom >> >>>>>>>>>>>>>>>>>>>> of it. >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> - Mark >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 10:18 PM, Jamie Johnson < >> >>>>>>>> jej2...@gmail.com> >> >>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> Mark >> >>>>>>>>>>>>>>>>>>>>> It's there a particular jira issue that you think >> may >> >>>>>>>> address >> >>>>>>>>>>>>> this? >> >>>>>>>>>>>>>>> I >> >>>>>>>>>>>>>>>>>>>> read >> >>>>>>>>>>>>>>>>>>>>> through it quickly but didn't see one that jumped >> out >> >>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013 10:07 PM, "Jamie Johnson" < >> >>>>>>>> jej2...@gmail.com> >> >>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> I brought the bad one down and back up and it did >> >>>>>>>> nothing. I >> >>>>>>>>>>> can >> >>>>>>>>>>>>>>>>>>>> clear >> >>>>>>>>>>>>>>>>>>>>>> the index and try4.2.1. I will save off the logs >> and >> >> see >> >>>>>>>> if >> >>>>>>>>>>> there >> >>>>>>>>>>>>>>> is >> >>>>>>>>>>>>>>>>>>>>>> anything else odd >> >>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013 9:13 PM, "Mark Miller" < >> >>>>>>>> markrmil...@gmail.com> >> >>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> It would appear it's a bug given what you have >> said. >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> Any other exceptions would be useful. Might be >> best >> >> to >> >>>>>>>> start >> >>>>>>>>>>>>>>>>>>>> tracking in >> >>>>>>>>>>>>>>>>>>>>>>> a JIRA issue as well. >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> To fix, I'd bring the behind node down and back >> >> again. >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> Unfortunately, I'm pressed for time, but we really >> >> need >> >>>>>>>> to >> >>>>>>>>>>> get >> >>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>> bottom of this and fix it, or determine if it's >> >> fixed in >> >>>>>>>>>>> 4.2.1 >> >>>>>>>>>>>>>>>>>>>> (spreading >> >>>>>>>>>>>>>>>>>>>>>>> to mirrors now). >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> - Mark >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 7:21 PM, Jamie Johnson < >> >>>>>>>> jej2...@gmail.com >> >>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> Sorry I didn't ask the obvious question. Is >> there >> >>>>>>>> anything >> >>>>>>>>>>>>> else >> >>>>>>>>>>>>>>>>>>>> that I >> >>>>>>>>>>>>>>>>>>>>>>>> should be looking for here and is this a bug? >> I'd >> >> be >> >>>>>>>> happy >> >>>>>>>>>>> to >> >>>>>>>>>>>>>>>>>>>> troll >> >>>>>>>>>>>>>>>>>>>>>>>> through the logs further if more information is >> >>>>>>>> needed, just >> >>>>>>>>>>>>> let >> >>>>>>>>>>>>>>> me >> >>>>>>>>>>>>>>>>>>>>>>> know. >> >>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> Also what is the most appropriate mechanism to >> fix >> >>>>>>>> this. >> >>>>>>>>>>> Is it >> >>>>>>>>>>>>>>>>>>>>>>> required to >> >>>>>>>>>>>>>>>>>>>>>>>> kill the index that is out of sync and let solr >> >> resync >> >>>>>>>>>>> things? >> >>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson < >> >>>>>>>>>>>>> jej2...@gmail.com >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> sorry for spamming here.... >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> shard5-core2 is the instance we're having issues >> >>>>>>>> with... >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM >> >>>>>>>> org.apache.solr.common.SolrException >> >>>>>>>>>>>>> log >> >>>>>>>>>>>>>>>>>>>>>>>>> SEVERE: shard update error StdNode: >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>> >> >> >> http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException >> >>>>>>>>>>>>>>>>>>>>>>> : >> >>>>>>>>>>>>>>>>>>>>>>>>> Server at >> >>>>>>>>>>>>> http://10.38.33.17:7577/solr/dsc-shard5-core2returned >> >>>>>>>>>>>>>>>>>>>> non >> >>>>>>>>>>>>>>>>>>>>>>> ok >> >>>>>>>>>>>>>>>>>>>>>>>>> status:503, message:Service Unavailable >> >>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>> >> >> >> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373) >> >>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>> >> >> >> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) >> >>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>> >> >> >> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332) >> >>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>> >> >> >> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306) >> >>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>> >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> >>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>> java.util.concurrent.FutureTask.run(FutureTask.java:138) >> >>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>> >> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) >> >>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>> >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> >>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>> java.util.concurrent.FutureTask.run(FutureTask.java:138) >> >>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>> >> >> >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> >>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>> >> >> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> >>>>>>>>>>>>>>>>>>>>>>>>> at java.lang.Thread.run(Thread.java:662) >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson < >> >>>>>>>>>>>>>>> jej2...@gmail.com> >> >>>>>>>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> here is another one that looks interesting >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM >> >>>>>>>>>>> org.apache.solr.common.SolrException >> >>>>>>>>>>>>> log >> >>>>>>>>>>>>>>>>>>>>>>>>>> SEVERE: org.apache.solr.common.SolrException: >> >>>>>>>> ClusterState >> >>>>>>>>>>>>> says >> >>>>>>>>>>>>>>>>>>>> we are >> >>>>>>>>>>>>>>>>>>>>>>>>>> the leader, but locally we don't think so >> >>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>> >> >> >> org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293) >> >>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>> >> >> >> org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228) >> >>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>> >> >> >> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339) >> >>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>> >> >> >> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) >> >>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>> >> >> >> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246) >> >>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) >> >>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>> >> >> >> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) >> >>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>> >> >> >> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) >> >>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>> >> >> >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) >> >>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>> >> >> org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) >> >>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>> >> >> >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) >> >>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>> >> >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson < >> >>>>>>>>>>>>>>> jej2...@gmail.com >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> Looking at the master it looks like at some >> point >> >>>>>>>> there >> >>>>>>>>>>> were >> >>>>>>>>>>>>>>>>>>>> shards >> >>>>>>>>>>>>>>>>>>>>>>> that >> >>>>>>>>>>>>>>>>>>>>>>>>>>> went down. I am seeing things like what is >> >> below. >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> NFO: A cluster state change: WatchedEvent >> >>>>>>>>>>>>> state:SyncConnected >> >>>>>>>>>>>>>>>>>>>>>>>>>>> type:NodeChildrenChanged path:/live_nodes, has >> >>>>>>>> occurred - >> >>>>>>>>>>>>>>>>>>>>>>> updating... (live >> >>>>>>>>>>>>>>>>>>>>>>>>>>> nodes size: 12) >> >>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM >> >>>>>>>>>>>>>>>>>>>> org.apache.solr.common.cloud.ZkStateReader$3 >> >>>>>>>>>>>>>>>>>>>>>>>>>>> process >> >>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Updating live nodes... (9) >> >>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM >> >>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext >> >>>>>>>>>>>>>>>>>>>>>>>>>>> runLeaderProcess >> >>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Running the leader process. >> >>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM >> >>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext >> >>>>>>>>>>>>>>>>>>>>>>>>>>> shouldIBeLeader >> >>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Checking if I should try and be the >> leader. >> >>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM >> >>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext >> >>>>>>>>>>>>>>>>>>>>>>>>>>> shouldIBeLeader >> >>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: My last published State was Active, it's >> >> okay >> >>>>>>>> to be >> >>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>> leader. >> >>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM >> >>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext >> >>>>>>>>>>>>>>>>>>>>>>>>>>> runLeaderProcess >> >>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: I may be the new leader - try and sync >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller < >> >>>>>>>>>>>>>>>>>>>> markrmil...@gmail.com >> >>>>>>>>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't think the versions you are thinking >> of >> >>>>>>>> apply >> >>>>>>>>>>> here. >> >>>>>>>>>>>>>>>>>>>> Peersync >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> does not look at that - it looks at version >> >>>>>>>> numbers for >> >>>>>>>>>>>>>>>>>>>> updates in >> >>>>>>>>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> transaction log - it compares the last 100 of >> >> them >> >>>>>>>> on >> >>>>>>>>>>>>> leader >> >>>>>>>>>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>>>>>>>> replica. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> What it's saying is that the replica seems to >> >> have >> >>>>>>>>>>> versions >> >>>>>>>>>>>>>>>>>>>> that >> >>>>>>>>>>>>>>>>>>>>>>> the leader >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> does not. Have you scanned the logs for any >> >>>>>>>> interesting >> >>>>>>>>>>>>>>>>>>>> exceptions? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Did the leader change during the heavy >> indexing? >> >>>>>>>> Did >> >>>>>>>>>>> any zk >> >>>>>>>>>>>>>>>>>>>> session >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> timeouts occur? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> - Mark >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 4:52 PM, Jamie Johnson < >> >>>>>>>>>>>>> jej2...@gmail.com >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am currently looking at moving our Solr >> >> cluster >> >>>>>>>> to >> >>>>>>>>>>> 4.2 >> >>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>>>>>>>> noticed a >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> strange issue while testing today. >> >> Specifically >> >>>>>>>> the >> >>>>>>>>>>>>> replica >> >>>>>>>>>>>>>>>>>>>> has a >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> higher >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> version than the master which is causing the >> >>>>>>>> index to >> >>>>>>>>>>> not >> >>>>>>>>>>>>>>>>>>>>>>> replicate. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Because of this the replica has fewer >> documents >> >>>>>>>> than >> >>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>> master. >> >>>>>>>>>>>>>>>>>>>>>>> What >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> could cause this and how can I resolve it >> >> short of >> >>>>>>>>>>> taking >> >>>>>>>>>>>>>>>>>>>> down the >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> index >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> and scping the right version in? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> MASTER: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Last Modified:about an hour ago >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Num Docs:164880 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Max Doc:164880 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Deleted Docs:0 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Version:2387 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Segment Count:23 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> REPLICA: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Last Modified: about an hour ago >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Num Docs:164773 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Max Doc:164773 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Deleted Docs:0 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Version:3001 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Segment Count:30 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the replicas log it says this: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Creating new http client, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>> >> >> >> config:maxConnectionsPerHost=20&maxConnections=10000&connTimeout=30000&socketTimeout=30000&retry=false >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM >> >>>>>>>> org.apache.solr.update.PeerSync >> >>>>>>>>>>>>> sync >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> url= >> http://10.38.33.17:7577/solrSTARTreplicas=[ >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> http://10.38.33.16:7575/solr/dsc-shard5-core1/ >> >> ] >> >>>>>>>>>>>>>>> nUpdates=100 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM >> >>>>>>>> org.apache.solr.update.PeerSync >> >>>>>>>>>>>>>>>>>>>>>>> handleVersions >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url= >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Received 100 versions from >> >>>>>>>>>>>>>>>>>>>> 10.38.33.16:7575/solr/dsc-shard5-core1/ >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM >> >>>>>>>> org.apache.solr.update.PeerSync >> >>>>>>>>>>>>>>>>>>>>>>> handleVersions >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url= >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr Our >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> versions are newer. >> >>>>>>>> ourLowThreshold=1431233788792274944 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> otherHigh=1431233789440294912 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM >> >>>>>>>> org.apache.solr.update.PeerSync >> >>>>>>>>>>>>> sync >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> url=http://10.38.33.17:7577/solrDONE. sync >> >>>>>>>> succeeded >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> which again seems to point that it thinks it >> >> has a >> >>>>>>>>>>> newer >> >>>>>>>>>>>>>>>>>>>> version of >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> index so it aborts. This happened while >> >> having 10 >> >>>>>>>>>>> threads >> >>>>>>>>>>>>>>>>>>>> indexing >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> 10,000 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> items writing to a 6 shard (1 replica each) >> >>>>>>>> cluster. >> >>>>>>>>>>> Any >> >>>>>>>>>>>>>>>>>>>> thoughts >> >>>>>>>>>>>>>>>>>>>>>>> on >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> this >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> or what I should look for would be >> appreciated. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>> >> >>>>>> >> >>>>> >> >>> >> >> >> >> >> >> >