My cluster has 9 machines with total memory 643G and 288 vCores.I use
fair-scheduler as my scheduler.
Before I turn preemption on , my fair-scheduler.xml is :
<allocations>
<queue name="highPriority">
<minResources>100000 mb, 30 vcores</minResources>
<maxResources>250000 mb, 100 vcores</maxResources>
</queue>
<queue name="default">
<minResources>50000 mb, 20 vcores</minResources>
<maxResources>100000 mb, 50 vcores</maxResources>
<maxAMShare>-1.0f</maxAMShare>
</queue>
<queue name="ep">
<minResources>100000 mb, 30 vcores</minResources>
<maxResources>300000 mb, 100 vcores</maxResources>
<maxAMShare>-1.0f</maxAMShare>
</queue>
<queue name="vip">
<minResources>30000 mb, 20 vcores</minResources>
<maxResources>60000 mb, 50 vcores</maxResources>
<maxAMShare>-1.0f</maxAMShare>
</queue>
<fairSharePreemptionTimeout>300</fairSharePreemptionTimeout>
</allocations>
The yarn-site.xml is:
<property>
<name>yarn.scheduler.fair.preemption</name>
<value>false</value>
</property>
The sum of maxResources of each queue is equal to the total memory of the
cluster. Of course ,due to the limitation of maxResources
for each queue , when a queue if full and other queues is idle , it still
cannot get any resources from other queues.So , the total resource usage of my
cluster is always low.
So I turn the preemption on and change the fair-scheduler.xml to this:
<allocations>
<queue name="highPriority">
<minResources>100000 mb, 30 vcores</minResources>
<maxResources>500000 mb, 100 vcores</maxResources>
<weight>0.35</weight>
<minSharePreemptionTimeout>20</minSharePreemptionTimeout>
<fairSharePreemptionTimeout>25</fairSharePreemptionTimeout>
<fairSharePreemptionThreshold>0.8</fairSharePreemptionThreshold>
</queue>
<queue name="default">
<minResources>25000 mb, 20 vcores</minResources>
<maxResources>225000 mb, 70 vcores</maxResources>
<weight>0.14</weight>
<minSharePreemptionTimeout>20</minSharePreemptionTimeout>
<fairSharePreemptionTimeout>25</fairSharePreemptionTimeout>
<fairSharePreemptionThreshold>0.5</fairSharePreemptionThreshold>
<maxAMShare>-1.0f</maxAMShare>
</queue>
<queue name="ep">
<minResources>200000 mb, 30 vcores</minResources>
<maxResources>600000 mb, 100 vcores</maxResources>
<weight>0.42</weight>
<minSharePreemptionTimeout>20</minSharePreemptionTimeout>
<fairSharePreemptionTimeout>25</fairSharePreemptionTimeout>
<fairSharePreemptionThreshold>0.8</fairSharePreemptionThreshold>
<maxAMShare>-1.0f</maxAMShare>
</queue>
<queue name="vip">
<minResources>50000 mb, 20 vcores</minResources>
<maxResources>120000 mb, 30 vcores</maxResources>
<weight>0.09</weight>
<minSharePreemptionTimeout>20</minSharePreemptionTimeout>
<fairSharePreemptionTimeout>25</fairSharePreemptionTimeout>
<fairSharePreemptionThreshold>0.8</fairSharePreemptionThreshold>
<maxAMShare>-1.0f</maxAMShare>
</queue>
</allocations>
ye Y
I double the maxResources for each queue .I know that even preemption is on ,
the total memory of each queue will not excess the maxResources config.
Yes , the cluster resource usage can goes to 90% , it proves that the total
cluster resource usage improves , but yarn applications always takes much more
time to finish.I don’t know what happened.
Anyone could give me some suggestions?