Hi Wu. If yarn.nodemanager.resource.memory-mb is greater than the amount of 
memory on a specific node, the scheduler will assign more containers to that 
node than probably should be running there. They will still run, but it will 
cause a lot of disk swapping, which will slow down each task running on that 
node.
I don't know much about the FairScheduler's preemption, but if preemption is 
aggressive, it could potentially kill more containers than are necessary, which 
causes the app to lose work that has to be redone.

      From: wuchang <[email protected]>
 To: [email protected] 
Cc: Chang. Wu <[email protected]>
 Sent: Monday, May 22, 2017 11:32 PM
 Subject: What if the configured node memory in yarn-site.xml is more than 
node's physical memory?
   
My yarn queue is using FairScheduler as my scheduler for my 4 queues, below is 
my queue configuration:<allocations> <queue name="highPriority"> 
<minResources>100000 mb, 30 vcores</minResources> <maxResources>250000 mb, 100 
vcores</maxResources> </queue> <queue name="default"> <minResources>50000 mb, 
20 vcores</minResources> <maxResources>100000 mb, 50 vcores</maxResources> 
<maxAMShare>-1.0f</maxAMShare> </queue> <queue name="ep"> <minResources>100000 
mb, 30 vcores</minResources> <maxResources>300000 mb, 100 vcores</maxResources> 
<maxAMShare>-1.0f</maxAMShare> </queue> <queue name="vip"> <minResources>30000 
mb, 20 vcores</minResources> <maxResources>60000 mb, 50 vcores</maxResources> 
<maxAMShare>-1.0f</maxAMShare> </queue> 
<fairSharePreemptionTimeout>300</fairSharePreemptionTimeout></allocations>

Obviously , I didn’t configure any preemption , so , the total cluster resource 
usage is very low , but , everything  is at least  running OK except that the 
total  resource usage rate of my cluster is not very high.
 So , I decide to turn on preemption and modify the fair-scheduler.xml like 
below:
<allocations> <queue name="highPriority"> <minResources>100000 mb, 30 
vcores</minResources> <maxResources>300000 mb, 100 vcores</maxResources> 
<weight>0.35</weight> <minSharePreemptionTimeout>20</minSharePreemptionTimeout> 
<fairSharePreemptionTimeout>25</fairSharePreemptionTimeout> 
<fairSharePreemptionThreshold>0.8</fairSharePreemptionThreshold> 
<maxAMShare>0.3f</maxAMShare> <maxRunningApps>18</maxRunningApps> </queue> 
<queue name="default"> <minResources>50000 mb, 20 vcores</minResources> 
<maxResources>140000 mb, 70 vcores</maxResources> <weight>0.14</weight> 
<minSharePreemptionTimeout>20</minSharePreemptionTimeout> 
<fairSharePreemptionTimeout>25</fairSharePreemptionTimeout> 
<fairSharePreemptionThreshold>0.5</fairSharePreemptionThreshold> 
<maxAMShare>0.3f</maxAMShare> <maxRunningApps>20</maxRunningApps> </queue> 
<queue name="ep"> <minResources>100000 mb, 30 vcores</minResources> 
<maxResources>600000 mb, 100 vcores</maxResources> <weight>0.42</weight> 
<minSharePreemptionTimeout>20</minSharePreemptionTimeout> 
<fairSharePreemptionTimeout>25</fairSharePreemptionTimeout> 
<fairSharePreemptionThreshold>0.8</fairSharePreemptionThreshold> 
<maxAMShare>0.3f</maxAMShare> <maxRunningApps>20</maxRunningApps> </queue> 
<queue name="vip"> <minResources>6000 mb, 20 vcores</minResources> 
<maxResources>120000 mb, 30 vcores</maxResources> <weight>0.09</weight> 
<minSharePreemptionTimeout>20</minSharePreemptionTimeout> 
<fairSharePreemptionTimeout>25</fairSharePreemptionTimeout> 
<fairSharePreemptionThreshold>0.8</fairSharePreemptionThreshold> 
<maxAMShare>0.3f</maxAMShare> <maxRunningApps>10</maxRunningApps> 
</queue></allocations>
Yes , after preemption is turned on , the total resource usage rate of my 
cluster is up to 90%+ , but  after one night(midnight is the busiest time for 
my yarn cluster) , I find that many applications delays. 
After a long time of trouble-shooting, I find that in my 9 machine cluster, 5 
has physical memory of 128G, and the left 4 machine has pythical memory 64G, 
but all their yarn-site.xml , the  yarn.nodemanager.resource.memory-mb is 
configured as 97280 ,that is to say , the  yarn.nodemanager.resource.memory-mb  
configuration in 4 machines is actually more that the actual pythical memory . 
So ,I doubt if this is what result in the phenomenon that even though the total 
cluster resource usage is improves, but each application takes more time to 
execute and delayed seriously.

Any suggestions?

   

Reply via email to