What you are saying is a bit of an easy fix. On the azure network security group lock down those public ip addresses to be accessible from your ip address or those ip addresses that are meant to have access to it.
Regards, Jonathan Aquilina EagleEyeT Phone: +356 2033 0099 Moblie + 356 7995 7942 Email: [email protected]<mailto:[email protected]> Website: https://eagleeyet.net From: Gaurav Chhabra <[email protected]> Sent: 13 June 2020 11:45 To: Hariharan <[email protected]> Cc: [email protected] <[email protected]> Subject: Re: Applications always showing in pending state even after cluster restart Wow! What a guess, Hari! :) I wasn't sure those pending tasks could have been related to an attack. This happened with me from 1st to 5th June'20. I didn't check my Azure usage during that time though I was keeping tab almost every day in May. On 8th June (Mon), when i checked the charges, the Azure 'data transfer out' charges were showing $88, $90 & $110 for bigdataserver-{5,6,7} respectively. I was shocked as my last month charge was around $53. I opened a ticket with Azure and then we again started the cluster (with Azure networking guy along with me) and within 3-4 minutes, data transfer out again was around 10-12 GB in total (from 3 instances). We could only figure out that the hits were going to some blob storage in Azure. He said it most likely seems to be a virus or some attack. I have now removed public IPs from all instances except two instances (one where Cloudera Manager is hosted and another where Resource Manager is running). Even those two exposed ones are allowed incoming requests specifically from my laptop's IP. Things are fine now. One thing that i don't get is how's the attacker 'personally' benefitting from this except for obviously raising my monthly bill? Regards On Sat, 13 Jun 2020 at 11:00, Hariharan <[email protected]<mailto:[email protected]>> wrote: This is most likely an attempt to attack your system. If you are running your cluster in the cloud, you should run it in a private network so it is not exposed to the Internet. Alternatively you can secure your installation as described here - https://blog.cloudera.com/how-to-secure-internet-exposed-apache-hadoop/ Thanks, Hari On Fri, 12 Jun 2020, 12:20 Gaurav Chhabra, <[email protected]<mailto:[email protected]>> wrote: Hi All, I have started learning Hadoop and its related components. I am following a tutorial on Hadoop Administration on Udemy. As part of the learning process, i ran the following command: $ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jarrandomtextwriter -Ddfs.replication=1 /user/bigdata/randomtextwriter Above command created 30 files each of size 1 GB. Then i ran the below reduce command: $ yarn jar/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \ wordcount \ -Dmapreduce.input.fileinputformat.split.minsize=268435456\ -Dmapreduce.job.reduces=8 \ /user/bigdata/randomtext \ /user/bigdata/wordcount After executing the above command, I just thought of killing the application after some time so i ran 'yarn application -list' first which listed a lot many applications out of which one was wordcount. I killed that particular application using 'yarn application -kill application-id'. However, when i checked the scheduler, i could see that several applications were still showing in Pending state so i ran the following command: $ for x in $(yarn application -list -appStates ACCEPTED | awk 'NR > 2 { print $1 }'); do yarn application -kill $x; done It was killing the applications as I could see the 'Apps Completed' count was going up but as soon as all the apps got killed, I saw those applications again getting created. Even if I stop the whole cluster and start again, the scheduler shows that there are submitted applications in Pending state. Here's the content of fair-scheduler.xml: <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <allocations> <queue name="root"> <schedulingPolicy>drf</schedulingPolicy> <queue name="default"> <schedulingPolicy>drf</schedulingPolicy> </queue> </queue> <queuePlacementPolicy> <rule name="specified" create="false"/> <rule name="default" create="true"/> </queuePlacementPolicy> </allocations> This is just a test cluster. I just want to kill the applications/clear the application queue. Any help will really be appreciated as I am struggling with it for the last few days. Regards --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected]<mailto:[email protected]> For additional commands, e-mail: [email protected]<mailto:[email protected]>
