Wow! What a guess, Hari! :) I wasn't sure those pending tasks could have
been related to an attack. This happened with me from 1st to 5th June'20. I
didn't check my Azure usage during that time though I was keeping tab
almost every day in May. On 8th June (Mon), when i checked the charges, the
Azure 'data transfer out' charges were showing $88, $90 & $110 for
bigdataserver-{5,6,7} respectively. I was shocked as my last month charge
was around $53. I opened a ticket with Azure and then we again started the
cluster (with Azure networking guy along with me) and within 3-4 minutes,
data transfer out again was around 10-12 GB in total (from 3 instances). We
could only figure out that the hits were going to some blob storage in
Azure. He said it most likely seems to be a virus or some attack.

I have now removed public IPs from all instances except two instances (one
where Cloudera Manager is hosted and another where Resource Manager is
running). Even those two exposed ones are allowed incoming requests
specifically from my laptop's IP. Things are fine now.

One thing that i don't get is how's the attacker 'personally' benefitting
from this except for obviously raising my monthly bill?


Regards



On Sat, 13 Jun 2020 at 11:00, Hariharan <[email protected]> wrote:

> This is most likely an attempt to attack your system. If you are running
> your cluster in the cloud, you should run it in a private network so it is
> not exposed to the Internet. Alternatively you can secure your installation
> as described here -
> https://blog.cloudera.com/how-to-secure-internet-exposed-apache-hadoop/
>
> Thanks,
> Hari
>
> On Fri, 12 Jun 2020, 12:20 Gaurav Chhabra, <[email protected]>
> wrote:
>
>> Hi All,
>>
>>
>> I have started learning Hadoop and its related components. I am following
>> a tutorial on Hadoop Administration on Udemy. As part of the learning
>> process, i ran the following command:
>>
>> $ hadoop jar
>> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jarrandomtextwriter
>> -Ddfs.replication=1 /user/bigdata/randomtextwriter
>>
>> Above command created 30 files each of size 1 GB. Then i ran the below
>> reduce command:
>>
>> $ yarn jar/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \
>> wordcount \
>> -Dmapreduce.input.fileinputformat.split.minsize=268435456\
>> -Dmapreduce.job.reduces=8 \
>> /user/bigdata/randomtext \
>> /user/bigdata/wordcount
>>
>> After executing the above command, I just thought of killing the
>> application after some time so i ran 'yarn application -list' first
>> which listed a lot many applications out of which one was *wordc**ount*.
>> I killed that particular application using 'yarn application -kill
>> application-id'. However, when i checked the scheduler, i could see that
>> several applications were still showing in Pending state so i ran the
>> following command:
>>
>> $ for x in $(yarn application -list -appStates ACCEPTED | awk 'NR > 2 {
>> print $1 }'); do yarn application -kill $x; done
>>
>> It was killing the applications as I could see the 'Apps Completed' count
>> was going up but as soon as all the apps got killed, I saw those
>> applications again getting created. Even if I stop the whole cluster and
>> start again, the scheduler shows that there are submitted applications
>> in Pending state.
>>
>> Here's the content of fair-scheduler.xml:
>>
>> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
>> <allocations>
>>     <queue name="root">
>>         <schedulingPolicy>drf</schedulingPolicy>
>>         <queue name="default">
>>             <schedulingPolicy>drf</schedulingPolicy>
>>         </queue>
>>     </queue>
>>     <queuePlacementPolicy>
>>         <rule name="specified" create="false"/>
>>         <rule name="default" create="true"/>
>>     </queuePlacementPolicy>
>> </allocations>
>>
>> This is just a test cluster.  I just want to kill the applications/clear
>> the application queue. Any help will really be appreciated as I am
>> struggling with it for the last few days.
>>
>>
>> Regards
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>
>

Reply via email to