Re: Recovery Issue - Solr 6.6.1 and HDFS

Erick Erickson Tue, 21 Nov 2017 22:17:28 -0800

Well, you can  always manually change the ZK nodes, but whether just
setting a node's state to "leader" in ZK then starting the Solr
instance hosting that node would work... I don't know. Do consider
running CheckIndex on one of the replicas in question first though.


Best,
Erick

On Tue, Nov 21, 2017 at 3:06 PM, Joe Obernberger
<joseph.obernber...@gmail.com> wrote:
> One other data point I just saw on one of the nodes.  It has the following
> error:
> 2017-11-21 22:59:48.886 ERROR
> (coreZkRegister-1-thread-1-processing-n:leda:9100_solr) [c:UNCLASS s:shard14
> r:core_node175 x:UNCLASS_shard14_replica3]
> o.a.s.c.ShardLeaderElectionContext There was a problem trying to register as
> the leader:org.apache.solr.common.SolrException: Leader Initiated Recovery
> prevented leadership
>         at
> org.apache.solr.cloud.ShardLeaderElectionContext.checkLIR(ElectionContext.java:521)
>         at
> org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:424)
>         at
> org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:170)
>         at
> org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:135)
>         at
> org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:307)
>         at
> org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:216)
>         at
> org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:684)
>         at
> org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:454)
>         at
> org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:170)
>         at
> org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:135)
>         at
> org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:307)
>         at
> org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:216)
>         at
> org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:684)
>         at
> org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:454)
>         at
> org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:170)
>         at
> org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:135)
>         at
> org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:307)
>         at
> org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:216)
>
> This stack trace repeats for a long while; looks like a recursive call.
>
>
> -Joe
>
>
> On 11/21/2017 3:24 PM, Hendrik Haddorp wrote:
>>
>> We sometimes also have replicas not recovering. If one replica is left
>> active the easiest is to then to delete the replica and create a new one.
>> When all replicas are down it helps most of the time to restart one of the
>> nodes that contains a replica in down state. If that also doesn't get the
>> replica to recover I would check the logs of the node and also that of the
>> overseer node. I have seen the same issue on Solr using local storage. The
>> main HDFS related issues we had so far was those lock files and if you
>> delete and recreate collections/cores and it sometimes happens that the data
>> was not cleaned up in HDFS and then causes a conflict.
>>
>> Hendrik
>>
>> On 21.11.2017 21:07, Joe Obernberger wrote:
>>>
>>> We've never run an index this size in anything but HDFS, so I have no
>>> comparison.  What we've been doing is keeping two main collections - all
>>> data, and the last 30 days of data.  Then we handle queries based on date
>>> range. The 30 day index is significantly faster.
>>>
>>> My main concern right now is that 6 of the 100 shards are not coming back
>>> because of no leader.  I've never seen this error before.  Any ideas?
>>> ClusterStatus shows all three replicas with state 'down'.
>>>
>>> Thanks!
>>>
>>> -joe
>>>
>>>
>>> On 11/21/2017 2:35 PM, Hendrik Haddorp wrote:
>>>>
>>>> We actually also have some performance issue with HDFS at the moment. We
>>>> are doing lots of soft commits for NRT search. Those seem to be slower then
>>>> with local storage. The investigation is however not really far yet.
>>>>
>>>> We have a setup with 2000 collections, with one shard each and a
>>>> replication factor of 2 or 3. When we restart nodes too fast that causes
>>>> problems with the overseer queue, which can lead to the queue getting out 
>>>> of
>>>> control and Solr pretty much dying. We are still on Solr 6.3. 6.6 has some
>>>> improvements and should handle these actions faster. I would check what you
>>>> see for "/solr/admin/collections?action=OVERSEERSTATUS&wt=json". The
>>>> critical part is the "overseer_queue_size" value. If this goes up to about
>>>> 10000 it is pretty much game over on our setup. In that case it seems to be
>>>> best to stop all nodes, clear the queue in ZK and then restart the nodes 
>>>> one
>>>> by one with a gap of like 5min. That normally recovers pretty well.
>>>>
>>>> regards,
>>>> Hendrik
>>>>
>>>> On 21.11.2017 20:12, Joe Obernberger wrote:
>>>>>
>>>>> We set the hard commit time long because we were having performance
>>>>> issues with HDFS, and thought that since the block size is 128M, having a
>>>>> longer hard commit made sense.  That was our hypothesis anyway. Happy to
>>>>> switch it back and see what happens.
>>>>>
>>>>> I don't know what caused the cluster to go into recovery in the first
>>>>> place.  We had a server die over the weekend, but it's just one out of 
>>>>> ~50.
>>>>> Every shard is 3x replicated (and 3x replicated in HDFS...so 9 copies).  
>>>>> It
>>>>> was at this point that we noticed lots of network activity, and most of 
>>>>> the
>>>>> shards in this recovery, fail, retry loop.  That is when we decided to 
>>>>> shut
>>>>> it down resulting in zombie lock files.
>>>>>
>>>>> I tried using the FORCELEADER call, which completed, but doesn't seem
>>>>> to have any effect on the shards that have no leader. Kinda out of ideas 
>>>>> for
>>>>> that problem.  If I can get the cluster back up, I'll try a lower hard
>>>>> commit time. Thanks again Erick!
>>>>>
>>>>> -Joe
>>>>>
>>>>>
>>>>> On 11/21/2017 2:00 PM, Erick Erickson wrote:
>>>>>>
>>>>>> Frankly with HDFS I'm a bit out of my depth so listen to Hendrik ;)...
>>>>>>
>>>>>> I need to back up a bit. Once nodes are in this state it's not
>>>>>> surprising that they need to be forcefully killed. I was more thinking
>>>>>> about how they got in this situation in the first place. _Before_ you
>>>>>> get into the nasty state how are the Solr nodes shut down? Forcefully?
>>>>>>
>>>>>> Your hard commit is far longer than it needs to be, resulting in much
>>>>>> larger tlog files etc. I usually set this at 15-60 seconds with local
>>>>>> disks, not quite sure whether longer intervals are helpful on HDFS.
>>>>>> What this means is that you can spend up to 30 minutes when you
>>>>>> restart solr _replaying the tlogs_! If Solr is killed, it may not have
>>>>>> had a chance to fsync the segments and may have to replay on startup.
>>>>>> If you have openSearcher set to false, the hard commit operation is
>>>>>> not horribly expensive, it just fsync's the current segments and opens
>>>>>> new ones. It won't be a total cure, but I bet reducing this interval
>>>>>> would help a lot.
>>>>>>
>>>>>> Also, if you stop indexing there's no need to wait 30 minutes if you
>>>>>> issue a manual commit, something like
>>>>>> .../collection/update?commit=true. Just reducing the hard commit
>>>>>> interval will make the wait between stopping indexing and restarting
>>>>>> shorter all by itself if you don't want to issue the manual commit.
>>>>>>
>>>>>> Best,
>>>>>> Erick
>>>>>>
>>>>>> On Tue, Nov 21, 2017 at 10:34 AM, Hendrik Haddorp
>>>>>> <hendrik.hadd...@gmx.net> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> the write.lock issue I see as well when Solr is not been stopped
>>>>>>> gracefully.
>>>>>>> The write.lock files are then left in the HDFS as they do not get
>>>>>>> removed
>>>>>>> automatically when the client disconnects like a ephemeral node in
>>>>>>> ZooKeeper. Unfortunately Solr does also not realize that it should be
>>>>>>> owning
>>>>>>> the lock as it is marked in the state stored in ZooKeeper as the
>>>>>>> owner and
>>>>>>> is also not willing to retry, which is why you need to restart the
>>>>>>> whole
>>>>>>> Solr instance after the cleanup. I added some logic to my Solr start
>>>>>>> up
>>>>>>> script which scans the log files in HDFS and compares that with the
>>>>>>> state in
>>>>>>> ZooKeeper and then delete all lock files that belong to the node that
>>>>>>> I'm
>>>>>>> starting.
>>>>>>>
>>>>>>> regards,
>>>>>>> Hendrik
>>>>>>>
>>>>>>>
>>>>>>> On 21.11.2017 14:07, Joe Obernberger wrote:
>>>>>>>>
>>>>>>>> Hi All - we have a system with 45 physical boxes running solr 6.6.1
>>>>>>>> using
>>>>>>>> HDFS as the index.  The current index size is about 31TBytes. With
>>>>>>>> 3x
>>>>>>>> replication that takes up 93TBytes of disk. Our main collection is
>>>>>>>> split
>>>>>>>> across 100 shards with 3 replicas each.  The issue that we're
>>>>>>>> running into
>>>>>>>> is when restarting the solr6 cluster.  The shards go into recovery
>>>>>>>> and start
>>>>>>>> to utilize nearly all of their network interfaces.  If we start too
>>>>>>>> many of
>>>>>>>> the nodes at once, the shards will go into a recovery, fail, and
>>>>>>>> retry loop
>>>>>>>> and never come up.  The errors are related to HDFS not responding
>>>>>>>> fast
>>>>>>>> enough and warnings from the DFSClient.  If we stop a node when this
>>>>>>>> is
>>>>>>>> happening, the script will force a stop (180 second timeout) and
>>>>>>>> upon
>>>>>>>> restart, we have lock files (write.lock) inside of HDFS.
>>>>>>>>
>>>>>>>> The process at this point is to start one node, find out the lock
>>>>>>>> files,
>>>>>>>> wait for it to come up completely (hours), stop it, delete the
>>>>>>>> write.lock
>>>>>>>> files, and restart.  Usually this second restart is faster, but it
>>>>>>>> still can
>>>>>>>> take 20-60 minutes.
>>>>>>>>
>>>>>>>> The smaller indexes recover much faster (less than 5 minutes).
>>>>>>>> Should we
>>>>>>>> have not used so many replicas with HDFS?  Is there a better way we
>>>>>>>> should
>>>>>>>> have built the solr6 cluster?
>>>>>>>>
>>>>>>>> Thank you for any insight!
>>>>>>>>
>>>>>>>> -Joe
>>>>>>>>
>>>>>> ---
>>>>>> This email has been checked for viruses by AVG.
>>>>>> http://www.avg.com
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Recovery Issue - Solr 6.6.1 and HDFS

Reply via email to