Re: Recovery Issue - Solr 6.6.1 and HDFS

Joe Obernberger Wed, 22 Nov 2017 05:19:09 -0800

Hi Hendrick - I was halting a replica and then restarting it, waited,then restarted another one. That didn't work, but when I halted allthree, and then restarted those one by one, the shard finally elected aleader and came up. Phew! I too noticed the lock files inindex.<timestamp> folders. Usually what I do is:

hadoop fs -ls -R /solr6.6.0 | grep write.lock > out.txt
then
cat out.txt | cut --bytes 57-
to get a list of files to delete


Glad these shards have come up!  Thanks very much.

-Joe


On 11/22/2017 5:20 AM, Hendrik Haddorp wrote:

Hi Joe,
sorry, I have not seen that problem. I would normally not delete areplica if the shard is down but only if there is an active shard.Without an active leader the replica should not be able to recover. Ialso just had a case where all replicas of a shard stayed in downstate and restarts didn't help. This was however also caused by lockfiles. Once I cleaned them up and restarted all Solr instances thathad a replica they recovered.
For the lock files I discovered that the index is not always in the"index" folder but can also be in an index.<timestamp> folder. Therecan be an "index.properties" file in the "data" directory in HDFS andthis contains the correct index folder name.
If you are really desperate you could also delete all but one replicaso that the leader election is quite trivial. But this does of courseincrease the risk of finally loosing the data quite a bit. So I wouldtry looking into the code and figure out what the problem is here andmaybe compare the state in HDFS and ZK with a shard that works.
regards,
Hendrik

On 21.11.2017 23:57, Joe Obernberger wrote:
Hi Hendrick - the shards in question have three replicas. I triedrestarting each one (one by one) - no luck. No leader is found. Ideleted one of the replicas and added a new one, and the new one alsoshows as 'down'. I also tried the FORCELEADER call, but that had noeffect. I checked the OVERSEERSTATUS, but there is nothing unusualthere. I don't see anything useful in the logs except the error:
org.apache.solr.common.SolrException: Error getting leader from zkfor shard shard21 atorg.apache.solr.cloud.ZkController.getLeader(ZkController.java:996) atorg.apache.solr.cloud.ZkController.register(ZkController.java:902) atorg.apache.solr.cloud.ZkController.register(ZkController.java:846) atorg.apache.solr.core.ZkContainer.lambda$registerInZk$0(ZkContainer.java:181) atorg.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.solr.common.SolrException: Could not get leaderprops atorg.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1043) atorg.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1007) atorg.apache.solr.cloud.ZkController.getLeader(ZkController.java:963)
    ... 7 more
Caused by: org.apache.zookeeper.KeeperException$NoNodeException:KeeperErrorCode = NoNode for /collections/UNCLASS/leaders/shard21/leader atorg.apache.zookeeper.KeeperException.create(KeeperException.java:111) atorg.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
atorg.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:357) atorg.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:354) atorg.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60) atorg.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:354) atorg.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1021)
    ... 9 more
Can I modify zookeeper to force a leader? Is there any other way torecover from this? Thanks very much!
-Joe


On 11/21/2017 3:24 PM, Hendrik Haddorp wrote:
We sometimes also have replicas not recovering. If one replica isleft active the easiest is to then to delete the replica and createa new one. When all replicas are down it helps most of the time torestart one of the nodes that contains a replica in down state. Ifthat also doesn't get the replica to recover I would check the logsof the node and also that of the overseer node. I have seen the sameissue on Solr using local storage. The main HDFS related issues wehad so far was those lock files and if you delete and recreatecollections/cores and it sometimes happens that the data was notcleaned up in HDFS and then causes a conflict.
Hendrik

On 21.11.2017 21:07, Joe Obernberger wrote:
We've never run an index this size in anything but HDFS, so I haveno comparison. What we've been doing is keeping two maincollections - all data, and the last 30 days of data. Then wehandle queries based on date range. The 30 day index issignificantly faster.
My main concern right now is that 6 of the 100 shards are notcoming back because of no leader. I've never seen this errorbefore. Any ideas? ClusterStatus shows all three replicas withstate 'down'.
Thanks!

-joe


On 11/21/2017 2:35 PM, Hendrik Haddorp wrote:
We actually also have some performance issue with HDFS at themoment. We are doing lots of soft commits for NRT search. Thoseseem to be slower then with local storage. The investigation ishowever not really far yet.
We have a setup with 2000 collections, with one shard each and areplication factor of 2 or 3. When we restart nodes too fast thatcauses problems with the overseer queue, which can lead to thequeue getting out of control and Solr pretty much dying. We arestill on Solr 6.3. 6.6 has some improvements and should handlethese actions faster. I would check what you see for"/solr/admin/collections?action=OVERSEERSTATUS&wt=json". Thecritical part is the "overseer_queue_size" value. If this goes upto about 10000 it is pretty much game over on our setup. In thatcase it seems to be best to stop all nodes, clear the queue in ZKand then restart the nodes one by one with a gap of like 5min.That normally recovers pretty well.
regards,
Hendrik

On 21.11.2017 20:12, Joe Obernberger wrote:
We set the hard commit time long because we were havingperformance issues with HDFS, and thought that since the blocksize is 128M, having a longer hard commit made sense. That wasour hypothesis anyway. Happy to switch it back and see what happens.
I don't know what caused the cluster to go into recovery in thefirst place. We had a server die over the weekend, but it's justone out of ~50. Every shard is 3x replicated (and 3x replicatedin HDFS...so 9 copies). It was at this point that we noticedlots of network activity, and most of the shards in thisrecovery, fail, retry loop. That is when we decided to shut itdown resulting in zombie lock files.
I tried using the FORCELEADER call, which completed, but doesn'tseem to have any effect on the shards that have no leader. Kindaout of ideas for that problem. If I can get the cluster back up,I'll try a lower hard commit time. Thanks again Erick!
-Joe


On 11/21/2017 2:00 PM, Erick Erickson wrote:
Frankly with HDFS I'm a bit out of my depth so listen to Hendrik;)...
I need to back up a bit. Once nodes are in this state it's not
surprising that they need to be forcefully killed. I was morethinkingabout how they got in this situation in the first place._Before_ youget into the nasty state how are the Solr nodes shut down?Forcefully?
Your hard commit is far longer than it needs to be, resulting inmuchlarger tlog files etc. I usually set this at 15-60 seconds withlocal
disks, not quite sure whether longer intervals are helpful on HDFS.
What this means is that you can spend up to 30 minutes when you
restart solr _replaying the tlogs_! If Solr is killed, it maynot havehad a chance to fsync the segments and may have to replay onstartup.
If you have openSearcher set to false, the hard commit operation is
not horribly expensive, it just fsync's the current segments andopensnew ones. It won't be a total cure, but I bet reducing thisinterval
would help a lot.
Also, if you stop indexing there's no need to wait 30 minutes ifyou
issue a manual commit, something like
.../collection/update?commit=true. Just reducing the hard commit
interval will make the wait between stopping indexing andrestarting
shorter all by itself if you don't want to issue the manual commit.

Best,
Erick

On Tue, Nov 21, 2017 at 10:34 AM, Hendrik Haddorp
<hendrik.hadd...@gmx.net> wrote:
Hi,
the write.lock issue I see as well when Solr is not beenstopped gracefully.The write.lock files are then left in the HDFS as they do notget removed
automatically when the client disconnects like a ephemeral node in
ZooKeeper. Unfortunately Solr does also not realize that itshould be owningthe lock as it is marked in the state stored in ZooKeeper asthe owner andis also not willing to retry, which is why you need to restartthe wholeSolr instance after the cleanup. I added some logic to my Solrstart upscript which scans the log files in HDFS and compares that withthe state inZooKeeper and then delete all lock files that belong to thenode that I'm
starting.

regards,
Hendrik


On 21.11.2017 14:07, Joe Obernberger wrote:
Hi All - we have a system with 45 physical boxes running solr6.6.1 usingHDFS as the index. The current index size is about 31TBytes.With 3xreplication that takes up 93TBytes of disk. Our maincollection is splitacross 100 shards with 3 replicas each. The issue that we'rerunning intois when restarting the solr6 cluster. The shards go intorecovery and startto utilize nearly all of their network interfaces. If we starttoo many ofthe nodes at once, the shards will go into a recovery, fail,and retry loopand never come up. The errors are related to HDFS notresponding fastenough and warnings from the DFSClient. If we stop a nodewhen this ishappening, the script will force a stop (180 second timeout)and upon
restart, we have lock files (write.lock) inside of HDFS.
The process at this point is to start one node, find out thelock files,wait for it to come up completely (hours), stop it, delete thewrite.lockfiles, and restart. Usually this second restart is faster,but it still can
take 20-60 minutes.
The smaller indexes recover much faster (less than 5 minutes).Should wehave not used so many replicas with HDFS? Is there a betterway we should
have built the solr6 cluster?

Thank you for any insight!

-Joe
---
This email has been checked for viruses by AVG.
http://www.avg.com

Re: Recovery Issue - Solr 6.6.1 and HDFS

Reply via email to