What you are saying is a bit of an easy fix.

On the azure network security group lock down those public ip addresses to be 
accessible from your ip address or those ip addresses that are meant to have 
access to it.

Regards,
Jonathan Aquilina
EagleEyeT

Phone: +356 2033 0099
Moblie + 356 7995 7942
Email: [email protected]<mailto:[email protected]>
Website: https://eagleeyet.net

From: Gaurav Chhabra <[email protected]>
Sent: 13 June 2020 11:45
To: Hariharan <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: Applications always showing in pending state even after cluster 
restart

Wow! What a guess, Hari! :) I wasn't sure those pending tasks could have been 
related to an attack. This happened with me from 1st to 5th June'20. I didn't 
check my Azure usage during that time though I was keeping tab almost every day 
in May. On 8th June (Mon), when i checked the charges, the Azure 'data transfer 
out' charges were showing $88, $90 & $110 for bigdataserver-{5,6,7} 
respectively. I was shocked as my last month charge was around $53. I opened a 
ticket with Azure and then we again started the cluster (with Azure networking 
guy along with me) and within 3-4 minutes, data transfer out again was around 
10-12 GB in total (from 3 instances). We could only figure out that the hits 
were going to some blob storage in Azure. He said it most likely seems to be a 
virus or some attack.

I have now removed public IPs from all instances except two instances (one 
where Cloudera Manager is hosted and another where Resource Manager is 
running). Even those two exposed ones are allowed incoming requests 
specifically from my laptop's IP. Things are fine now.

One thing that i don't get is how's the attacker 'personally' benefitting from 
this except for obviously raising my monthly bill?


Regards



On Sat, 13 Jun 2020 at 11:00, Hariharan 
<[email protected]<mailto:[email protected]>> wrote:
This is most likely an attempt to attack your system. If you are running your 
cluster in the cloud, you should run it in a private network so it is not 
exposed to the Internet. Alternatively you can secure your installation as 
described here - 
https://blog.cloudera.com/how-to-secure-internet-exposed-apache-hadoop/

Thanks,
Hari

On Fri, 12 Jun 2020, 12:20 Gaurav Chhabra, 
<[email protected]<mailto:[email protected]>> wrote:
Hi All,


I have started learning Hadoop and its related components. I am following a 
tutorial on Hadoop Administration on Udemy. As part of the learning process, i 
ran the following command:

$ hadoop jar 
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jarrandomtextwriter 
-Ddfs.replication=1 /user/bigdata/randomtextwriter

Above command created 30 files each of size 1 GB. Then i ran the below reduce 
command:

$ yarn jar/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \
wordcount \
-Dmapreduce.input.fileinputformat.split.minsize=268435456\
-Dmapreduce.job.reduces=8 \
/user/bigdata/randomtext \
/user/bigdata/wordcount

After executing the above command, I just thought of killing the application 
after some time so i ran 'yarn application -list' first which listed a lot many 
applications out of which one was wordcount. I killed that particular 
application using 'yarn application -kill application-id'. However, when i 
checked the scheduler, i could see that several applications were still showing 
in Pending state so i ran the following command:

$ for x in $(yarn application -list -appStates ACCEPTED | awk 'NR > 2 { print 
$1 }'); do yarn application -kill $x; done
It was killing the applications as I could see the 'Apps Completed' count was 
going up but as soon as all the apps got killed, I saw those applications again 
getting created. Even if I stop the whole cluster and start again, the 
scheduler shows that there are submitted applications in Pending state.


Here's the content of fair-scheduler.xml:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<allocations>

    <queue name="root">

        <schedulingPolicy>drf</schedulingPolicy>

        <queue name="default">

            <schedulingPolicy>drf</schedulingPolicy>

        </queue>

    </queue>

    <queuePlacementPolicy>

        <rule name="specified" create="false"/>

        <rule name="default" create="true"/>

    </queuePlacementPolicy>

</allocations>
This is just a test cluster.  I just want to kill the applications/clear the 
application queue. Any help will really be appreciated as I am struggling with 
it for the last few days.


Regards


---------------------------------------------------------------------
To unsubscribe, e-mail: 
[email protected]<mailto:[email protected]>
For additional commands, e-mail: 
[email protected]<mailto:[email protected]>

Reply via email to