Hi All, The following are the requirements, which should be implememted through Sun Grid Engine:
1. Q1 - One queue with all the cores (e.g, 2 nodes, 8 cores) 2. Q2 & Q3 - 2 more queues under the large queue (6 cores & 2 cores) 3. user1 & user2 allowed to submit job to Q2 4. User3 & User4 allowed to submit job to Q3 only 5. Also User1 is allowed to submit job in Q1. (Brustable) 6. Can Application be only allowed to run through Q2? 7. Can Application be allowed to run through Q3 & Q1? The cluster has 2 systems(1 master + 1 node). Each is Dual core, Dual Processor. So 4 cores each. Totally 8 cores. In the following I explain, what I've done and what not.. Created queues: q1: Hosts=2 Master+Node slots=4 q2: Hosts=2 Master+Node slots=3 q3: Hosts=1 Node slots=1 Users: user1 user2 user3 user4 Usersets: userset12 userset34 Also, user1 & user2 belongs to userset12 user3 & user4 belongs to userset34 1. Q1 - One queue with all the cores (e.g, 2 nodes, 8 cores) Created queue q1, with 2 nodes, slots=4 2. Q2 & Q3 - 2 more queues under the large queue (6 cores & 2 cores) Mentioned above. How to do that these two queues should fall under q1, the large queue? May I know what are subordinate queues? If we make q2 and q3 subordinate to q1(q2 of 6 core & q3 of 2 core), does it meet our requirement? If not, is it possible to do it in other way? 3. user1 & user2 allowed to submit job to Q2 I've given access userset12 to q2. By this user1 and user2 can submit the jobs to q2. and given userset34 as xuserset. Now if user1 or user2 submit a parallel job of 6 mpi process, will it take 4 cores from master and 2 core from Node? I tested it. 6 process job was not getting executed. But 3 process job got executed. The error is: ... .... parallel environment: mpiq2 range: 6 scheduling info: has no permission for queue " [EMAIL PROTECTED]" cannot run in queue instance " [EMAIL PROTECTED]" because it is not contained in its hard queue list (-q) has no permission for queue " [EMAIL PROTECTED]" has no permission for queue " [EMAIL PROTECTED]" has no permission for queue " [EMAIL PROTECTED]" cannot run in queue instance " [EMAIL PROTECTED]" because it is not contained in its hard queue list (-q) has no permission for queue " [EMAIL PROTECTED]" cannot run in PE "mpiq2" because it only offers 0 slots Means, it is not running when mpi processes are more than 3. May I know what went wrong here? In case of serial jobs its working.If user1/user2 submits 6 serial jobs, 3 gets running on master and three on compute node. If a 7th job is submitted, it will be queued & waiting and starts to run when the slot becomes free. So the problem with parallel job has to be resolved. 4. User3 & User4 allowed to submit job to Q3 only Given access to userset34 and xuserset=userset12 If user3 or user4 submits a 2 process job, job gets submitted but doesn't execute. Error is: parallel environment: mpiq34 range: 2 scheduling info: has no permission for host "locuzcluster.local" has no permission for host "compute-0-0.local" cannot run in PE "mpiq34" because it only offers 0 slots The PE config is as follows: $ qconf -sp mpiq34 pe_name mpiq34 slots 2 user_lists userset34 xuser_lists userset12 start_proc_args /share/apps/MPICH2/startmpi.sh -catch_rsh $pe_hostfile stop_proc_args /share/apps/MPICH2/stopmpi.sh allocation_rule $fill_up control_slaves TRUE job_is_first_task TRUE urgency_slots min I'm not getting how to resolve this issue. There might be something wrong in the settings. 5. Also User1 is allowed to submit job in Q1. (Brustable) For this I've added user1 to the owner's list and userset34 as xuserset. 6. Can Application be only allowed to run through Q2? 7. Can Application be allowed to run through Q3 & Q1? Still these two has to be implemented. Can anyone help me out to resolve above mentioned issues? Thanks, Sangamesh
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf