Well, you can always manually change the ZK nodes, but whether just setting a node's state to "leader" in ZK then starting the Solr instance hosting that node would work... I don't know. Do consider running CheckIndex on one of the replicas in question first though.
Best, Erick On Tue, Nov 21, 2017 at 3:06 PM, Joe Obernberger <joseph.obernber...@gmail.com> wrote: > One other data point I just saw on one of the nodes. It has the following > error: > 2017-11-21 22:59:48.886 ERROR > (coreZkRegister-1-thread-1-processing-n:leda:9100_solr) [c:UNCLASS s:shard14 > r:core_node175 x:UNCLASS_shard14_replica3] > o.a.s.c.ShardLeaderElectionContext There was a problem trying to register as > the leader:org.apache.solr.common.SolrException: Leader Initiated Recovery > prevented leadership > at > org.apache.solr.cloud.ShardLeaderElectionContext.checkLIR(ElectionContext.java:521) > at > org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:424) > at > org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:170) > at > org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:135) > at > org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:307) > at > org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:216) > at > org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:684) > at > org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:454) > at > org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:170) > at > org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:135) > at > org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:307) > at > org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:216) > at > org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:684) > at > org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:454) > at > org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:170) > at > org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:135) > at > org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:307) > at > org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:216) > > This stack trace repeats for a long while; looks like a recursive call. > > > -Joe > > > On 11/21/2017 3:24 PM, Hendrik Haddorp wrote: >> >> We sometimes also have replicas not recovering. If one replica is left >> active the easiest is to then to delete the replica and create a new one. >> When all replicas are down it helps most of the time to restart one of the >> nodes that contains a replica in down state. If that also doesn't get the >> replica to recover I would check the logs of the node and also that of the >> overseer node. I have seen the same issue on Solr using local storage. The >> main HDFS related issues we had so far was those lock files and if you >> delete and recreate collections/cores and it sometimes happens that the data >> was not cleaned up in HDFS and then causes a conflict. >> >> Hendrik >> >> On 21.11.2017 21:07, Joe Obernberger wrote: >>> >>> We've never run an index this size in anything but HDFS, so I have no >>> comparison. What we've been doing is keeping two main collections - all >>> data, and the last 30 days of data. Then we handle queries based on date >>> range. The 30 day index is significantly faster. >>> >>> My main concern right now is that 6 of the 100 shards are not coming back >>> because of no leader. I've never seen this error before. Any ideas? >>> ClusterStatus shows all three replicas with state 'down'. >>> >>> Thanks! >>> >>> -joe >>> >>> >>> On 11/21/2017 2:35 PM, Hendrik Haddorp wrote: >>>> >>>> We actually also have some performance issue with HDFS at the moment. We >>>> are doing lots of soft commits for NRT search. Those seem to be slower then >>>> with local storage. The investigation is however not really far yet. >>>> >>>> We have a setup with 2000 collections, with one shard each and a >>>> replication factor of 2 or 3. When we restart nodes too fast that causes >>>> problems with the overseer queue, which can lead to the queue getting out >>>> of >>>> control and Solr pretty much dying. We are still on Solr 6.3. 6.6 has some >>>> improvements and should handle these actions faster. I would check what you >>>> see for "/solr/admin/collections?action=OVERSEERSTATUS&wt=json". The >>>> critical part is the "overseer_queue_size" value. If this goes up to about >>>> 10000 it is pretty much game over on our setup. In that case it seems to be >>>> best to stop all nodes, clear the queue in ZK and then restart the nodes >>>> one >>>> by one with a gap of like 5min. That normally recovers pretty well. >>>> >>>> regards, >>>> Hendrik >>>> >>>> On 21.11.2017 20:12, Joe Obernberger wrote: >>>>> >>>>> We set the hard commit time long because we were having performance >>>>> issues with HDFS, and thought that since the block size is 128M, having a >>>>> longer hard commit made sense. That was our hypothesis anyway. Happy to >>>>> switch it back and see what happens. >>>>> >>>>> I don't know what caused the cluster to go into recovery in the first >>>>> place. We had a server die over the weekend, but it's just one out of >>>>> ~50. >>>>> Every shard is 3x replicated (and 3x replicated in HDFS...so 9 copies). >>>>> It >>>>> was at this point that we noticed lots of network activity, and most of >>>>> the >>>>> shards in this recovery, fail, retry loop. That is when we decided to >>>>> shut >>>>> it down resulting in zombie lock files. >>>>> >>>>> I tried using the FORCELEADER call, which completed, but doesn't seem >>>>> to have any effect on the shards that have no leader. Kinda out of ideas >>>>> for >>>>> that problem. If I can get the cluster back up, I'll try a lower hard >>>>> commit time. Thanks again Erick! >>>>> >>>>> -Joe >>>>> >>>>> >>>>> On 11/21/2017 2:00 PM, Erick Erickson wrote: >>>>>> >>>>>> Frankly with HDFS I'm a bit out of my depth so listen to Hendrik ;)... >>>>>> >>>>>> I need to back up a bit. Once nodes are in this state it's not >>>>>> surprising that they need to be forcefully killed. I was more thinking >>>>>> about how they got in this situation in the first place. _Before_ you >>>>>> get into the nasty state how are the Solr nodes shut down? Forcefully? >>>>>> >>>>>> Your hard commit is far longer than it needs to be, resulting in much >>>>>> larger tlog files etc. I usually set this at 15-60 seconds with local >>>>>> disks, not quite sure whether longer intervals are helpful on HDFS. >>>>>> What this means is that you can spend up to 30 minutes when you >>>>>> restart solr _replaying the tlogs_! If Solr is killed, it may not have >>>>>> had a chance to fsync the segments and may have to replay on startup. >>>>>> If you have openSearcher set to false, the hard commit operation is >>>>>> not horribly expensive, it just fsync's the current segments and opens >>>>>> new ones. It won't be a total cure, but I bet reducing this interval >>>>>> would help a lot. >>>>>> >>>>>> Also, if you stop indexing there's no need to wait 30 minutes if you >>>>>> issue a manual commit, something like >>>>>> .../collection/update?commit=true. Just reducing the hard commit >>>>>> interval will make the wait between stopping indexing and restarting >>>>>> shorter all by itself if you don't want to issue the manual commit. >>>>>> >>>>>> Best, >>>>>> Erick >>>>>> >>>>>> On Tue, Nov 21, 2017 at 10:34 AM, Hendrik Haddorp >>>>>> <hendrik.hadd...@gmx.net> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> the write.lock issue I see as well when Solr is not been stopped >>>>>>> gracefully. >>>>>>> The write.lock files are then left in the HDFS as they do not get >>>>>>> removed >>>>>>> automatically when the client disconnects like a ephemeral node in >>>>>>> ZooKeeper. Unfortunately Solr does also not realize that it should be >>>>>>> owning >>>>>>> the lock as it is marked in the state stored in ZooKeeper as the >>>>>>> owner and >>>>>>> is also not willing to retry, which is why you need to restart the >>>>>>> whole >>>>>>> Solr instance after the cleanup. I added some logic to my Solr start >>>>>>> up >>>>>>> script which scans the log files in HDFS and compares that with the >>>>>>> state in >>>>>>> ZooKeeper and then delete all lock files that belong to the node that >>>>>>> I'm >>>>>>> starting. >>>>>>> >>>>>>> regards, >>>>>>> Hendrik >>>>>>> >>>>>>> >>>>>>> On 21.11.2017 14:07, Joe Obernberger wrote: >>>>>>>> >>>>>>>> Hi All - we have a system with 45 physical boxes running solr 6.6.1 >>>>>>>> using >>>>>>>> HDFS as the index. The current index size is about 31TBytes. With >>>>>>>> 3x >>>>>>>> replication that takes up 93TBytes of disk. Our main collection is >>>>>>>> split >>>>>>>> across 100 shards with 3 replicas each. The issue that we're >>>>>>>> running into >>>>>>>> is when restarting the solr6 cluster. The shards go into recovery >>>>>>>> and start >>>>>>>> to utilize nearly all of their network interfaces. If we start too >>>>>>>> many of >>>>>>>> the nodes at once, the shards will go into a recovery, fail, and >>>>>>>> retry loop >>>>>>>> and never come up. The errors are related to HDFS not responding >>>>>>>> fast >>>>>>>> enough and warnings from the DFSClient. If we stop a node when this >>>>>>>> is >>>>>>>> happening, the script will force a stop (180 second timeout) and >>>>>>>> upon >>>>>>>> restart, we have lock files (write.lock) inside of HDFS. >>>>>>>> >>>>>>>> The process at this point is to start one node, find out the lock >>>>>>>> files, >>>>>>>> wait for it to come up completely (hours), stop it, delete the >>>>>>>> write.lock >>>>>>>> files, and restart. Usually this second restart is faster, but it >>>>>>>> still can >>>>>>>> take 20-60 minutes. >>>>>>>> >>>>>>>> The smaller indexes recover much faster (less than 5 minutes). >>>>>>>> Should we >>>>>>>> have not used so many replicas with HDFS? Is there a better way we >>>>>>>> should >>>>>>>> have built the solr6 cluster? >>>>>>>> >>>>>>>> Thank you for any insight! >>>>>>>> >>>>>>>> -Joe >>>>>>>> >>>>>> --- >>>>>> This email has been checked for viruses by AVG. >>>>>> http://www.avg.com >>>>>> >>>>> >>>> >>> >> >