sorry that should say none of the _* files were present, not one....
On Wed, Apr 3, 2013 at 10:16 PM, Jamie Johnson <jej2...@gmail.com> wrote: > I have since removed the files but when I had looked there was an index > directory, the only files I remember being there were the segments, one of > the _* files were present. I'll watch it to see if it happens again but it > happened on 2 of the shards while heavy indexing. > > > On Wed, Apr 3, 2013 at 10:13 PM, Mark Miller <markrmil...@gmail.com>wrote: > >> Is that file still there when you look? Not being able to find an index >> file is not a common error I've seen recently. >> >> Do those replicas have an index directory or when you look on disk, is it >> an index.timestamp directory? >> >> - Mark >> >> On Apr 3, 2013, at 10:01 PM, Jamie Johnson <jej2...@gmail.com> wrote: >> >> > so something is still not right. Things were going ok, but I'm seeing >> this >> > in the logs of several of the replicas >> > >> > SEVERE: Unable to create core: dsc-shard3-core1 >> > org.apache.solr.common.SolrException: Error opening new searcher >> > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:822) >> > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:618) >> > at >> > org.apache.solr.core.CoreContainer.createFromZk(CoreContainer.java:967) >> > at >> > org.apache.solr.core.CoreContainer.create(CoreContainer.java:1049) >> > at >> org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634) >> > at >> org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) >> > at >> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> > at java.util.concurrent.FutureTask.run(FutureTask.java:138) >> > at >> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) >> > at >> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> > at java.util.concurrent.FutureTask.run(FutureTask.java:138) >> > at >> > >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> > at >> > >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> > at java.lang.Thread.run(Thread.java:662) >> > Caused by: org.apache.solr.common.SolrException: Error opening new >> searcher >> > at >> org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1435) >> > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1547) >> > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:797) >> > ... 13 more >> > Caused by: org.apache.solr.common.SolrException: Error opening Reader >> > at >> > >> org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:172) >> > at >> > >> org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:183) >> > at >> > >> org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:179) >> > at >> org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1411) >> > ... 15 more >> > Caused by: java.io.FileNotFoundException: >> > /cce2/solr/data/dsc-shard3-core1/index/_13x.si (No such file or >> directory) >> > at java.io.RandomAccessFile.open(Native Method) >> > at java.io.RandomAccessFile.<init>(RandomAccessFile.java:216) >> > at >> > org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:193) >> > at >> > >> org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232) >> > at >> > >> org.apache.lucene.codecs.lucene40.Lucene40SegmentInfoReader.read(Lucene40SegmentInfoReader.java:50) >> > at >> org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:301) >> > at >> > >> org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:56) >> > at >> > >> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783) >> > at >> > >> org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52) >> > at >> > org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:88) >> > at >> > >> org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:34) >> > at >> > >> org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:169) >> > ... 18 more >> > >> > >> > >> > On Wed, Apr 3, 2013 at 8:54 PM, Jamie Johnson <jej2...@gmail.com> >> wrote: >> > >> >> Thanks I will try that. >> >> >> >> >> >> On Wed, Apr 3, 2013 at 8:28 PM, Mark Miller <markrmil...@gmail.com> >> wrote: >> >> >> >>> >> >>> >> >>> On Apr 3, 2013, at 8:17 PM, Jamie Johnson <jej2...@gmail.com> wrote: >> >>> >> >>>> I am not using the concurrent low pause garbage collector, I could >> look >> >>> at >> >>>> switching, I'm assuming you're talking about adding >> >>> -XX:+UseConcMarkSweepGC >> >>>> correct? >> >>> >> >>> Right - if you don't do that, the default is almost always the >> throughput >> >>> collector (I've only seen OSX buck this trend when apple handled >> java). >> >>> That means stop the world garbage collections, so with larger heaps, >> that >> >>> can be a fair amount of time that no threads can run. It's not that >> great >> >>> for something as interactive as search generally is anyway, but it's >> always >> >>> not that great when added to heavy load and a 15 sec session timeout >> >>> between solr and zk. >> >>> >> >>> >> >>> The below is odd - a replica node is waiting for the leader to see it >> as >> >>> recovering and live - live means it has created an ephemeral node for >> that >> >>> Solr corecontainer in zk - it's very strange if that didn't happen, >> unless >> >>> this happened during shutdown or something. >> >>> >> >>>> >> >>>> I also just had a shard go down and am seeing this in the log >> >>>> >> >>>> SEVERE: org.apache.solr.common.SolrException: I was asked to wait on >> >>> state >> >>>> down for 10.38.33.17:7576_solr but I still do not see the requested >> >>> state. >> >>>> I see state: recovering live:false >> >>>> at >> >>>> >> >>> >> org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler.java:890) >> >>>> at >> >>>> >> >>> >> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:186) >> >>>> at >> >>>> >> >>> >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) >> >>>> at >> >>>> >> >>> >> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:591) >> >>>> at >> >>>> >> >>> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:192) >> >>>> at >> >>>> >> >>> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) >> >>>> at >> >>>> >> >>> >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) >> >>>> at >> >>>> >> >>> >> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) >> >>>> at >> >>>> >> >>> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) >> >>>> at >> >>>> >> >>> >> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) >> >>>> at >> >>>> >> >>> >> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) >> >>>> >> >>>> Nothing other than this in the log jumps out as interesting though. >> >>>> >> >>>> >> >>>> On Wed, Apr 3, 2013 at 7:47 PM, Mark Miller <markrmil...@gmail.com> >> >>> wrote: >> >>>> >> >>>>> This shouldn't be a problem though, if things are working as they >> are >> >>>>> supposed to. Another node should simply take over as the overseer >> and >> >>>>> continue processing the work queue. It's just best if you configure >> so >> >>> that >> >>>>> session timeouts don't happen unless a node is really down. On the >> >>> other >> >>>>> hand, it's nicer to detect that faster. Your tradeoff to make. >> >>>>> >> >>>>> - Mark >> >>>>> >> >>>>> On Apr 3, 2013, at 7:46 PM, Mark Miller <markrmil...@gmail.com> >> wrote: >> >>>>> >> >>>>>> Yeah. Are you using the concurrent low pause garbage collector? >> >>>>>> >> >>>>>> This means the overseer wasn't able to communicate with zk for 15 >> >>>>> seconds - due to load or gc or whatever. If you can't resolve the >> root >> >>>>> cause of that, or the load just won't allow for it, next best thing >> >>> you can >> >>>>> do is raise it to 30 seconds. >> >>>>>> >> >>>>>> - Mark >> >>>>>> >> >>>>>> On Apr 3, 2013, at 7:41 PM, Jamie Johnson <jej2...@gmail.com> >> wrote: >> >>>>>> >> >>>>>>> I am occasionally seeing this in the log, is this just a timeout >> >>> issue? >> >>>>>>> Should I be increasing the zk client timeout? >> >>>>>>> >> >>>>>>> WARNING: Overseer cannot talk to ZK >> >>>>>>> Apr 3, 2013 11:14:25 PM >> >>>>>>> org.apache.solr.cloud.DistributedQueue$LatchChildWatcher process >> >>>>>>> INFO: Watcher fired on path: null state: Expired type None >> >>>>>>> Apr 3, 2013 11:14:25 PM >> >>>>> org.apache.solr.cloud.Overseer$ClusterStateUpdater >> >>>>>>> run >> >>>>>>> WARNING: Solr cannot talk to ZK, exiting Overseer main queue loop >> >>>>>>> org.apache.zookeeper.KeeperException$SessionExpiredException: >> >>>>>>> KeeperErrorCode = Session expired for /overseer/queue >> >>>>>>> at >> >>>>>>> >> org.apache.zookeeper.KeeperException.create(KeeperException.java:127) >> >>>>>>> at >> >>>>>>> >> org.apache.zookeeper.KeeperException.create(KeeperException.java:51) >> >>>>>>> at >> >>> org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468) >> >>>>>>> at >> >>>>>>> >> >>>>> >> >>> >> org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:236) >> >>>>>>> at >> >>>>>>> >> >>>>> >> >>> >> org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:233) >> >>>>>>> at >> >>>>>>> >> >>>>> >> >>> >> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:65) >> >>>>>>> at >> >>>>>>> >> >>>>> >> >>> >> org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:233) >> >>>>>>> at >> >>>>>>> >> >>>>> >> >>> >> org.apache.solr.cloud.DistributedQueue.orderedChildren(DistributedQueue.java:89) >> >>>>>>> at >> >>>>>>> >> >>>>> >> >>> >> org.apache.solr.cloud.DistributedQueue.element(DistributedQueue.java:131) >> >>>>>>> at >> >>>>>>> >> >>> org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:326) >> >>>>>>> at >> >>>>>>> >> >>>>> >> >>> >> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:128) >> >>>>>>> at java.lang.Thread.run(Thread.java:662) >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> On Wed, Apr 3, 2013 at 7:25 PM, Jamie Johnson <jej2...@gmail.com> >> >>>>> wrote: >> >>>>>>> >> >>>>>>>> just an update, I'm at 1M records now with no issues. This looks >> >>>>>>>> promising as to the cause of my issues, thanks for the help. Is >> the >> >>>>>>>> routing method with numShards documented anywhere? I know >> >>> numShards is >> >>>>>>>> documented but I didn't know that the routing changed if you >> don't >> >>>>> specify >> >>>>>>>> it. >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> On Wed, Apr 3, 2013 at 4:44 PM, Jamie Johnson <jej2...@gmail.com >> > >> >>>>> wrote: >> >>>>>>>> >> >>>>>>>>> with these changes things are looking good, I'm up to 600,000 >> >>>>> documents >> >>>>>>>>> without any issues as of right now. I'll keep going and add >> more >> >>> to >> >>>>> see if >> >>>>>>>>> I find anything. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> On Wed, Apr 3, 2013 at 4:01 PM, Jamie Johnson < >> jej2...@gmail.com> >> >>>>> wrote: >> >>>>>>>>> >> >>>>>>>>>> ok, so that's not a deal breaker for me. I just changed it to >> >>> match >> >>>>> the >> >>>>>>>>>> shards that are auto created and it looks like things are >> happy. >> >>>>> I'll go >> >>>>>>>>>> ahead and try my test to see if I can get things out of sync. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> On Wed, Apr 3, 2013 at 3:56 PM, Mark Miller < >> >>> markrmil...@gmail.com >> >>>>>> wrote: >> >>>>>>>>>> >> >>>>>>>>>>> I had thought you could - but looking at the code recently, I >> >>> don't >> >>>>>>>>>>> think you can anymore. I think that's a technical limitation >> more >> >>>>> than >> >>>>>>>>>>> anything though. When these changes were made, I think support >> >>> for >> >>>>> that was >> >>>>>>>>>>> simply not added at the time. >> >>>>>>>>>>> >> >>>>>>>>>>> I'm not sure exactly how straightforward it would be, but it >> >>> seems >> >>>>>>>>>>> doable - as it is, the overseer will preallocate shards when >> >>> first >> >>>>> creating >> >>>>>>>>>>> the collection - that's when they get named shard(n). There >> would >> >>>>> have to >> >>>>>>>>>>> be logic to replace shard(n) with the custom shard name when >> the >> >>>>> core >> >>>>>>>>>>> actually registers. >> >>>>>>>>>>> >> >>>>>>>>>>> - Mark >> >>>>>>>>>>> >> >>>>>>>>>>> On Apr 3, 2013, at 3:42 PM, Jamie Johnson <jej2...@gmail.com> >> >>>>> wrote: >> >>>>>>>>>>> >> >>>>>>>>>>>> answered my own question, it now says compositeId. What is >> >>>>>>>>>>> problematic >> >>>>>>>>>>>> though is that in addition to my shards (which are say >> >>>>> jamie-shard1) >> >>>>>>>>>>> I see >> >>>>>>>>>>>> the solr created shards (shard1). I assume that these were >> >>> created >> >>>>>>>>>>> because >> >>>>>>>>>>>> of the numShards param. Is there no way to specify the >> names of >> >>>>> these >> >>>>>>>>>>>> shards? >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> On Wed, Apr 3, 2013 at 3:25 PM, Jamie Johnson < >> >>> jej2...@gmail.com> >> >>>>>>>>>>> wrote: >> >>>>>>>>>>>> >> >>>>>>>>>>>>> ah interesting....so I need to specify num shards, blow out >> zk >> >>> and >> >>>>>>>>>>> then >> >>>>>>>>>>>>> try this again to see if things work properly now. What is >> >>> really >> >>>>>>>>>>> strange >> >>>>>>>>>>>>> is that for the most part things have worked right and on >> >>> 4.2.1 I >> >>>>>>>>>>> have >> >>>>>>>>>>>>> 600,000 items indexed with no duplicates. In any event I >> will >> >>>>>>>>>>> specify num >> >>>>>>>>>>>>> shards clear out zk and begin again. If this works properly >> >>> what >> >>>>>>>>>>> should >> >>>>>>>>>>>>> the router type be? >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> On Wed, Apr 3, 2013 at 3:14 PM, Mark Miller < >> >>>>> markrmil...@gmail.com> >> >>>>>>>>>>> wrote: >> >>>>>>>>>>>>> >> >>>>>>>>>>>>>> If you don't specify numShards after 4.1, you get an >> implicit >> >>> doc >> >>>>>>>>>>> router >> >>>>>>>>>>>>>> and it's up to you to distribute updates. In the past, >> >>>>> partitioning >> >>>>>>>>>>> was >> >>>>>>>>>>>>>> done on the fly - but for shard splitting and perhaps other >> >>>>>>>>>>> features, we >> >>>>>>>>>>>>>> now divvy up the hash range up front based on numShards and >> >>> store >> >>>>>>>>>>> it in >> >>>>>>>>>>>>>> ZooKeeper. No numShards is now how you take complete >> control >> >>> of >> >>>>>>>>>>> updates >> >>>>>>>>>>>>>> yourself. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> - Mark >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> On Apr 3, 2013, at 2:57 PM, Jamie Johnson < >> jej2...@gmail.com> >> >>>>>>>>>>> wrote: >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> The router says "implicit". I did start from a blank zk >> >>> state >> >>>>> but >> >>>>>>>>>>>>>> perhaps >> >>>>>>>>>>>>>>> I missed one of the ZkCLI commands? One of my shards from >> >>> the >> >>>>>>>>>>>>>>> clusterstate.json is shown below. What is the process >> that >> >>>>> should >> >>>>>>>>>>> be >> >>>>>>>>>>>>>> done >> >>>>>>>>>>>>>>> to bootstrap a cluster other than the ZkCLI commands I >> listed >> >>>>>>>>>>> above? My >> >>>>>>>>>>>>>>> process right now is run those ZkCLI commands and then >> start >> >>>>> solr >> >>>>>>>>>>> on >> >>>>>>>>>>>>>> all of >> >>>>>>>>>>>>>>> the instances with a command like this >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> java -server -Dshard=shard5 -DcoreName=shard5-core1 >> >>>>>>>>>>>>>>> -Dsolr.data.dir=/solr/data/shard5-core1 >> >>>>>>>>>>>>>> -Dcollection.configName=solr-conf >> >>>>>>>>>>>>>>> -Dcollection=collection1 >> >>>>>>>>>>> -DzkHost=so-zoo1:2181,so-zoo2:2181,so-zoo3:2181 >> >>>>>>>>>>>>>>> -Djetty.port=7575 -DhostPort=7575 -jar start.jar >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> I feel like maybe I'm missing a step. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> "shard5":{ >> >>>>>>>>>>>>>>> "state":"active", >> >>>>>>>>>>>>>>> "replicas":{ >> >>>>>>>>>>>>>>> "10.38.33.16:7575_solr_shard5-core1":{ >> >>>>>>>>>>>>>>> "shard":"shard5", >> >>>>>>>>>>>>>>> "state":"active", >> >>>>>>>>>>>>>>> "core":"shard5-core1", >> >>>>>>>>>>>>>>> "collection":"collection1", >> >>>>>>>>>>>>>>> "node_name":"10.38.33.16:7575_solr", >> >>>>>>>>>>>>>>> "base_url":"http://10.38.33.16:7575/solr", >> >>>>>>>>>>>>>>> "leader":"true"}, >> >>>>>>>>>>>>>>> "10.38.33.17:7577_solr_shard5-core2":{ >> >>>>>>>>>>>>>>> "shard":"shard5", >> >>>>>>>>>>>>>>> "state":"recovering", >> >>>>>>>>>>>>>>> "core":"shard5-core2", >> >>>>>>>>>>>>>>> "collection":"collection1", >> >>>>>>>>>>>>>>> "node_name":"10.38.33.17:7577_solr", >> >>>>>>>>>>>>>>> "base_url":"http://10.38.33.17:7577/solr"}}} >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 2:40 PM, Mark Miller < >> >>>>> markrmil...@gmail.com >> >>>>>>>>>>>> >> >>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> It should be part of your clusterstate.json. Some users >> have >> >>>>>>>>>>> reported >> >>>>>>>>>>>>>>>> trouble upgrading a previous zk install when this change >> >>> came. >> >>>>> I >> >>>>>>>>>>>>>>>> recommended manually updating the clusterstate.json to >> have >> >>> the >> >>>>>>>>>>> right >> >>>>>>>>>>>>>> info, >> >>>>>>>>>>>>>>>> and that seemed to work. Otherwise, I guess you have to >> >>> start >> >>>>>>>>>>> from a >> >>>>>>>>>>>>>> clean >> >>>>>>>>>>>>>>>> zk state. >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> If you don't have that range information, I think there >> >>> will be >> >>>>>>>>>>>>>> trouble. >> >>>>>>>>>>>>>>>> Do you have an router type defined in the >> clusterstate.json? >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> - Mark >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> On Apr 3, 2013, at 2:24 PM, Jamie Johnson < >> >>> jej2...@gmail.com> >> >>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> Where is this information stored in ZK? I don't see it >> in >> >>> the >> >>>>>>>>>>> cluster >> >>>>>>>>>>>>>>>>> state (or perhaps I don't understand it ;) ). >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> Perhaps something with my process is broken. What I do >> >>> when I >> >>>>>>>>>>> start >> >>>>>>>>>>>>>> from >> >>>>>>>>>>>>>>>>> scratch is the following >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> ZkCLI -cmd upconfig ... >> >>>>>>>>>>>>>>>>> ZkCLI -cmd linkconfig .... >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> but I don't ever explicitly create the collection. What >> >>>>> should >> >>>>>>>>>>> the >> >>>>>>>>>>>>>> steps >> >>>>>>>>>>>>>>>>> from scratch be? I am moving from an unreleased >> snapshot >> >>> of >> >>>>> 4.0 >> >>>>>>>>>>> so I >> >>>>>>>>>>>>>>>> never >> >>>>>>>>>>>>>>>>> did that previously either so perhaps I did create the >> >>>>>>>>>>> collection in >> >>>>>>>>>>>>>> one >> >>>>>>>>>>>>>>>> of >> >>>>>>>>>>>>>>>>> my steps to get this working but have forgotten it along >> >>> the >> >>>>> way. >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 2:16 PM, Mark Miller < >> >>>>>>>>>>> markrmil...@gmail.com> >> >>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> Thanks for digging Jamie. In 4.2, hash ranges are >> >>> assigned up >> >>>>>>>>>>> front >> >>>>>>>>>>>>>>>> when a >> >>>>>>>>>>>>>>>>>> collection is created - each shard gets a range, which >> is >> >>>>>>>>>>> stored in >> >>>>>>>>>>>>>>>>>> zookeeper. You should not be able to end up with the >> same >> >>> id >> >>>>> on >> >>>>>>>>>>>>>>>> different >> >>>>>>>>>>>>>>>>>> shards - something very odd going on. >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> Hopefully I'll have some time to try and help you >> >>> reproduce. >> >>>>>>>>>>> Ideally >> >>>>>>>>>>>>>> we >> >>>>>>>>>>>>>>>>>> can capture it in a test case. >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> - Mark >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> On Apr 3, 2013, at 1:13 PM, Jamie Johnson < >> >>> jej2...@gmail.com >> >>>>>> >> >>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> no, my thought was wrong, it appears that even with >> the >> >>>>>>>>>>> parameter >> >>>>>>>>>>>>>> set I >> >>>>>>>>>>>>>>>>>> am >> >>>>>>>>>>>>>>>>>>> seeing this behavior. I've been able to duplicate it >> on >> >>>>> 4.2.0 >> >>>>>>>>>>> by >> >>>>>>>>>>>>>>>>>> indexing >> >>>>>>>>>>>>>>>>>>> 100,000 documents on 10 threads (10,000 each) when I >> get >> >>> to >> >>>>>>>>>>> 400,000 >> >>>>>>>>>>>>>> or >> >>>>>>>>>>>>>>>>>> so. >> >>>>>>>>>>>>>>>>>>> I will try this on 4.2.1. to see if I see the same >> >>> behavior >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 12:37 PM, Jamie Johnson < >> >>>>>>>>>>> jej2...@gmail.com> >> >>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> Since I don't have that many items in my index I >> >>> exported >> >>>>> all >> >>>>>>>>>>> of >> >>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>> keys >> >>>>>>>>>>>>>>>>>>>> for each shard and wrote a simple java program that >> >>> checks >> >>>>> for >> >>>>>>>>>>>>>>>>>> duplicates. >> >>>>>>>>>>>>>>>>>>>> I found some duplicate keys on different shards, a >> grep >> >>> of >> >>>>> the >> >>>>>>>>>>>>>> files >> >>>>>>>>>>>>>>>> for >> >>>>>>>>>>>>>>>>>>>> the keys found does indicate that they made it to the >> >>> wrong >> >>>>>>>>>>> places. >> >>>>>>>>>>>>>>>> If >> >>>>>>>>>>>>>>>>>> you >> >>>>>>>>>>>>>>>>>>>> notice documents with the same ID are on shard 3 and >> >>> shard >> >>>>> 5. >> >>>>>>>>>>> Is >> >>>>>>>>>>>>>> it >> >>>>>>>>>>>>>>>>>>>> possible that the hash is being calculated taking >> into >> >>>>>>>>>>> account only >> >>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>> "live" nodes? I know that we don't specify the >> >>> numShards >> >>>>>>>>>>> param @ >> >>>>>>>>>>>>>>>>>> startup >> >>>>>>>>>>>>>>>>>>>> so could this be what is happening? >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> grep -c "7cd1a717-3d94-4f5d-bcb1-9d8a95ca78de" * >> >>>>>>>>>>>>>>>>>>>> shard1-core1:0 >> >>>>>>>>>>>>>>>>>>>> shard1-core2:0 >> >>>>>>>>>>>>>>>>>>>> shard2-core1:0 >> >>>>>>>>>>>>>>>>>>>> shard2-core2:0 >> >>>>>>>>>>>>>>>>>>>> shard3-core1:1 >> >>>>>>>>>>>>>>>>>>>> shard3-core2:1 >> >>>>>>>>>>>>>>>>>>>> shard4-core1:0 >> >>>>>>>>>>>>>>>>>>>> shard4-core2:0 >> >>>>>>>>>>>>>>>>>>>> shard5-core1:1 >> >>>>>>>>>>>>>>>>>>>> shard5-core2:1 >> >>>>>>>>>>>>>>>>>>>> shard6-core1:0 >> >>>>>>>>>>>>>>>>>>>> shard6-core2:0 >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 10:42 AM, Jamie Johnson < >> >>>>>>>>>>> jej2...@gmail.com> >> >>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> Something interesting that I'm noticing as well, I >> just >> >>>>>>>>>>> indexed >> >>>>>>>>>>>>>>>> 300,000 >> >>>>>>>>>>>>>>>>>>>>> items, and some how 300,020 ended up in the index. >> I >> >>>>> thought >> >>>>>>>>>>>>>>>> perhaps I >> >>>>>>>>>>>>>>>>>>>>> messed something up so I started the indexing again >> and >> >>>>>>>>>>> indexed >> >>>>>>>>>>>>>>>> another >> >>>>>>>>>>>>>>>>>>>>> 400,000 and I see 400,064 docs. Is there a good >> way to >> >>>>> find >> >>>>>>>>>>>>>>>> possibile >> >>>>>>>>>>>>>>>>>>>>> duplicates? I had tried to facet on key (our id >> field) >> >>>>> but >> >>>>>>>>>>> that >> >>>>>>>>>>>>>>>> didn't >> >>>>>>>>>>>>>>>>>>>>> give me anything with more than a count of 1. >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 9:22 AM, Jamie Johnson < >> >>>>>>>>>>> jej2...@gmail.com> >> >>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> Ok, so clearing the transaction log allowed things >> to >> >>> go >> >>>>>>>>>>> again. >> >>>>>>>>>>>>>> I >> >>>>>>>>>>>>>>>> am >> >>>>>>>>>>>>>>>>>>>>>> going to clear the index and try to replicate the >> >>>>> problem on >> >>>>>>>>>>>>>> 4.2.0 >> >>>>>>>>>>>>>>>>>> and then >> >>>>>>>>>>>>>>>>>>>>>> I'll try on 4.2.1 >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 8:21 AM, Mark Miller < >> >>>>>>>>>>>>>> markrmil...@gmail.com >> >>>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> No, not that I know if, which is why I say we >> need to >> >>>>> get >> >>>>>>>>>>> to the >> >>>>>>>>>>>>>>>>>> bottom >> >>>>>>>>>>>>>>>>>>>>>>> of it. >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> - Mark >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 10:18 PM, Jamie Johnson < >> >>>>>>>>>>> jej2...@gmail.com> >> >>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> Mark >> >>>>>>>>>>>>>>>>>>>>>>>> It's there a particular jira issue that you think >> >>> may >> >>>>>>>>>>> address >> >>>>>>>>>>>>>>>> this? >> >>>>>>>>>>>>>>>>>> I >> >>>>>>>>>>>>>>>>>>>>>>> read >> >>>>>>>>>>>>>>>>>>>>>>>> through it quickly but didn't see one that jumped >> >>> out >> >>>>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013 10:07 PM, "Jamie Johnson" < >> >>>>>>>>>>> jej2...@gmail.com> >> >>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> I brought the bad one down and back up and it >> did >> >>>>>>>>>>> nothing. I >> >>>>>>>>>>>>>> can >> >>>>>>>>>>>>>>>>>>>>>>> clear >> >>>>>>>>>>>>>>>>>>>>>>>>> the index and try4.2.1. I will save off the logs >> >>> and >> >>>>> see >> >>>>>>>>>>> if >> >>>>>>>>>>>>>> there >> >>>>>>>>>>>>>>>>>> is >> >>>>>>>>>>>>>>>>>>>>>>>>> anything else odd >> >>>>>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013 9:13 PM, "Mark Miller" < >> >>>>>>>>>>> markrmil...@gmail.com> >> >>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> It would appear it's a bug given what you have >> >>> said. >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> Any other exceptions would be useful. Might be >> >>> best >> >>>>> to >> >>>>>>>>>>> start >> >>>>>>>>>>>>>>>>>>>>>>> tracking in >> >>>>>>>>>>>>>>>>>>>>>>>>>> a JIRA issue as well. >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> To fix, I'd bring the behind node down and back >> >>>>> again. >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> Unfortunately, I'm pressed for time, but we >> really >> >>>>> need >> >>>>>>>>>>> to >> >>>>>>>>>>>>>> get >> >>>>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>>>>> bottom of this and fix it, or determine if it's >> >>>>> fixed in >> >>>>>>>>>>>>>> 4.2.1 >> >>>>>>>>>>>>>>>>>>>>>>> (spreading >> >>>>>>>>>>>>>>>>>>>>>>>>>> to mirrors now). >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> - Mark >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 7:21 PM, Jamie Johnson < >> >>>>>>>>>>> jej2...@gmail.com >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry I didn't ask the obvious question. Is >> >>> there >> >>>>>>>>>>> anything >> >>>>>>>>>>>>>>>> else >> >>>>>>>>>>>>>>>>>>>>>>> that I >> >>>>>>>>>>>>>>>>>>>>>>>>>>> should be looking for here and is this a bug? >> >>> I'd >> >>>>> be >> >>>>>>>>>>> happy >> >>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>>>>>>>> troll >> >>>>>>>>>>>>>>>>>>>>>>>>>>> through the logs further if more information >> is >> >>>>>>>>>>> needed, just >> >>>>>>>>>>>>>>>> let >> >>>>>>>>>>>>>>>>>> me >> >>>>>>>>>>>>>>>>>>>>>>>>>> know. >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> Also what is the most appropriate mechanism to >> >>> fix >> >>>>>>>>>>> this. >> >>>>>>>>>>>>>> Is it >> >>>>>>>>>>>>>>>>>>>>>>>>>> required to >> >>>>>>>>>>>>>>>>>>>>>>>>>>> kill the index that is out of sync and let >> solr >> >>>>> resync >> >>>>>>>>>>>>>> things? >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson >> < >> >>>>>>>>>>>>>>>> jej2...@gmail.com >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> sorry for spamming here.... >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> shard5-core2 is the instance we're having >> issues >> >>>>>>>>>>> with... >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM >> >>>>>>>>>>> org.apache.solr.common.SolrException >> >>>>>>>>>>>>>>>> log >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> SEVERE: shard update error StdNode: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>> >> >>> >> http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException >> >>>>>>>>>>>>>>>>>>>>>>>>>> : >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Server at >> >>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr/dsc-shard5-core2returned >> >>>>>>>>>>>>>>>>>>>>>>> non >> >>>>>>>>>>>>>>>>>>>>>>>>>> ok >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> status:503, message:Service Unavailable >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>> >> >>> >> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>> >> >>> >> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>> >> >>> >> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>> >> >>> >> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>> >> >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>> java.util.concurrent.FutureTask.run(FutureTask.java:138) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>> >> >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>> java.util.concurrent.FutureTask.run(FutureTask.java:138) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>> >> >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>> >> >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> at java.lang.Thread.run(Thread.java:662) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:43 PM, Jamie >> Johnson < >> >>>>>>>>>>>>>>>>>> jej2...@gmail.com> >> >>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> here is another one that looks interesting >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM >> >>>>>>>>>>>>>> org.apache.solr.common.SolrException >> >>>>>>>>>>>>>>>> log >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> SEVERE: >> org.apache.solr.common.SolrException: >> >>>>>>>>>>> ClusterState >> >>>>>>>>>>>>>>>> says >> >>>>>>>>>>>>>>>>>>>>>>> we are >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the leader, but locally we don't think so >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>> >> >>> >> org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>> >> >>> >> org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>> >> >>> >> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>> >> >>> >> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>> >> >>> >> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>> >> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>> >> >>> >> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>> >> >>> >> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>> >> >>> >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>> org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>> >> >>> >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>> >> >>> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:41 PM, Jamie >> Johnson < >> >>>>>>>>>>>>>>>>>> jej2...@gmail.com >> >>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looking at the master it looks like at some >> >>> point >> >>>>>>>>>>> there >> >>>>>>>>>>>>>> were >> >>>>>>>>>>>>>>>>>>>>>>> shards >> >>>>>>>>>>>>>>>>>>>>>>>>>> that >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> went down. I am seeing things like what is >> >>>>> below. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> NFO: A cluster state change: WatchedEvent >> >>>>>>>>>>>>>>>> state:SyncConnected >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> type:NodeChildrenChanged path:/live_nodes, >> has >> >>>>>>>>>>> occurred - >> >>>>>>>>>>>>>>>>>>>>>>>>>> updating... (live >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> nodes size: 12) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM >> >>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.common.cloud.ZkStateReader$3 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> process >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Updating live nodes... (9) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> org.apache.solr.cloud.ShardLeaderElectionContext >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runLeaderProcess >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Running the leader process. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> org.apache.solr.cloud.ShardLeaderElectionContext >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> shouldIBeLeader >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Checking if I should try and be the >> >>> leader. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> org.apache.solr.cloud.ShardLeaderElectionContext >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> shouldIBeLeader >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: My last published State was Active, >> it's >> >>>>> okay >> >>>>>>>>>>> to be >> >>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>> leader. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> org.apache.solr.cloud.ShardLeaderElectionContext >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runLeaderProcess >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: I may be the new leader - try and >> sync >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:09 PM, Mark >> Miller < >> >>>>>>>>>>>>>>>>>>>>>>> markrmil...@gmail.com >> >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't think the versions you are >> thinking >> >>> of >> >>>>>>>>>>> apply >> >>>>>>>>>>>>>> here. >> >>>>>>>>>>>>>>>>>>>>>>> Peersync >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does not look at that - it looks at >> version >> >>>>>>>>>>> numbers for >> >>>>>>>>>>>>>>>>>>>>>>> updates in >> >>>>>>>>>>>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> transaction log - it compares the last >> 100 of >> >>>>> them >> >>>>>>>>>>> on >> >>>>>>>>>>>>>>>> leader >> >>>>>>>>>>>>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>>>>>>>>>>> replica. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What it's saying is that the replica >> seems to >> >>>>> have >> >>>>>>>>>>>>>> versions >> >>>>>>>>>>>>>>>>>>>>>>> that >> >>>>>>>>>>>>>>>>>>>>>>>>>> the leader >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does not. Have you scanned the logs for >> any >> >>>>>>>>>>> interesting >> >>>>>>>>>>>>>>>>>>>>>>> exceptions? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Did the leader change during the heavy >> >>> indexing? >> >>>>>>>>>>> Did >> >>>>>>>>>>>>>> any zk >> >>>>>>>>>>>>>>>>>>>>>>> session >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> timeouts occur? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Mark >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 4:52 PM, Jamie Johnson >> < >> >>>>>>>>>>>>>>>> jej2...@gmail.com >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am currently looking at moving our Solr >> >>>>> cluster >> >>>>>>>>>>> to >> >>>>>>>>>>>>>> 4.2 >> >>>>>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>>>>>>>>>>> noticed a >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> strange issue while testing today. >> >>>>> Specifically >> >>>>>>>>>>> the >> >>>>>>>>>>>>>>>> replica >> >>>>>>>>>>>>>>>>>>>>>>> has a >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> higher >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> version than the master which is causing >> the >> >>>>>>>>>>> index to >> >>>>>>>>>>>>>> not >> >>>>>>>>>>>>>>>>>>>>>>>>>> replicate. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Because of this the replica has fewer >> >>> documents >> >>>>>>>>>>> than >> >>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>> master. >> >>>>>>>>>>>>>>>>>>>>>>>>>> What >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> could cause this and how can I resolve it >> >>>>> short of >> >>>>>>>>>>>>>> taking >> >>>>>>>>>>>>>>>>>>>>>>> down the >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> index >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and scping the right version in? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MASTER: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Last Modified:about an hour ago >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Num Docs:164880 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Max Doc:164880 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Deleted Docs:0 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Version:2387 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Segment Count:23 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> REPLICA: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Last Modified: about an hour ago >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Num Docs:164773 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Max Doc:164773 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Deleted Docs:0 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Version:3001 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Segment Count:30 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the replicas log it says this: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Creating new http client, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>> >> >>> >> config:maxConnectionsPerHost=20&maxConnections=10000&connTimeout=30000&socketTimeout=30000&retry=false >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM >> >>>>>>>>>>> org.apache.solr.update.PeerSync >> >>>>>>>>>>>>>>>> sync >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> url= >> >>> http://10.38.33.17:7577/solrSTARTreplicas=[ >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>> http://10.38.33.16:7575/solr/dsc-shard5-core1/ >> >>>>> ] >> >>>>>>>>>>>>>>>>>> nUpdates=100 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM >> >>>>>>>>>>> org.apache.solr.update.PeerSync >> >>>>>>>>>>>>>>>>>>>>>>>>>> handleVersions >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 >> url= >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Received 100 versions from >> >>>>>>>>>>>>>>>>>>>>>>> 10.38.33.16:7575/solr/dsc-shard5-core1/ >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM >> >>>>>>>>>>> org.apache.solr.update.PeerSync >> >>>>>>>>>>>>>>>>>>>>>>>>>> handleVersions >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 >> url= >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr Our >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> versions are newer. >> >>>>>>>>>>> ourLowThreshold=1431233788792274944 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> otherHigh=1431233789440294912 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM >> >>>>>>>>>>> org.apache.solr.update.PeerSync >> >>>>>>>>>>>>>>>> sync >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> url=http://10.38.33.17:7577/solrDONE. >> sync >> >>>>>>>>>>> succeeded >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which again seems to point that it >> thinks it >> >>>>> has a >> >>>>>>>>>>>>>> newer >> >>>>>>>>>>>>>>>>>>>>>>> version of >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> index so it aborts. This happened while >> >>>>> having 10 >> >>>>>>>>>>>>>> threads >> >>>>>>>>>>>>>>>>>>>>>>> indexing >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 10,000 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> items writing to a 6 shard (1 replica >> each) >> >>>>>>>>>>> cluster. >> >>>>>>>>>>>>>> Any >> >>>>>>>>>>>>>>>>>>>>>>> thoughts >> >>>>>>>>>>>>>>>>>>>>>>>>>> on >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> or what I should look for would be >> >>> appreciated. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>> >> >>>>>>>> >> >>>>>> >> >>>>> >> >>>>> >> >>> >> >>> >> >> >> >> >