Hi All, I'm trying to set up a machine farm using GNU Queue as the job scheduler, but I'm having difficulty understanding exactly what I need to do.
The situation is... I have a number of machines, lets call them MachineA, MachineB, MachineC, etc. So my qhostsfile looks like this: --------------------------------------------- MachineA MachineB MachineC MachineD MachineE MachineF MachineG MachineH MachineI Machinej --------------------------------------------- Machines A-F are single processor 3GB of RAM Machines G-J are dual processor 2GB of RAM I have a number of different types of job that I want to run, each of which require licences that I have a fixed number of. For example, only 6 jobs of Type 1 can run simultaneously, and only 8 of job Type 2. All jobs are expected to be run non-interactivly. JobType 1 requires a machine with 3GB of RAM, and so mustn't be allocated a 2GB box or share with other jobs. JobType 2 can run on 2GB boxes and can share. My plan is to make a queue for each different job type. # cd /var/lib/queue # ls -l drwxr-xr-x 3 root root 512 Feb 27 12:00 JobType1 drwxr-xr-x 3 root root 512 Feb 27 12:01 JobType2 drwxr-xr-x 3 root root 512 Feb 27 11:50 now drwxr-xr-x 3 root root 512 Jan 16 16:01 wait So now I'm comming to write the profiles for the queues and I'm getting stuck (primarilly because the documentation is little more than reference material). I've copied what I've got down below. Do these look about right? Is there any way to ensure jobs don't get started on machines that have only a small amount of free memory? Do the rlimit variables set limits (i.e. like the shell limit command)? Is there any other setup I need to do? I don't have access to the machine farm at the moment, so I'm trying to set-up as much as possible before hand, hence I can try any of this just yet. Any insights anybody can give me will be useful. Thanks Paul -------------------- Job Type 1 Profile --------------------- exec on mail /var/lib/queue/JobType1/mail_log supervisor /var/lib/queue/JobType1/mail_log2 host MachineA pfactor 100 host MachineB pfactor 100 host MachineC pfactor 100 host MachineD pfactor 100 host MachineE pfactor 100 host MachineF pfactor 100 host MachineG pfactor 1 host MachineH pfactor 1 host MachineI pfactor 1 host MachineJ pfactor 1 maxexec 6 host MachineA vmaxexec 1 host MachineB vmaxexec 1 host MachineC vmaxexec 1 host MachineD vmaxexec 1 host MachineE vmaxexec 1 host MachineF vmaxexec 1 host MachineG vmaxexec 0 host MachineH vmaxexec 0 host MachineI vmaxexec 0 host MachineJ vmaxexec 0 -------------------- Job Type 2 Profile --------------------- exec on mail /var/lib/queue/JobType2/mail_log supervisor /var/lib/queue/JobType2/mail_log2 host MachineA pfactor 1 host MachineB pfactor 1 host MachineC pfactor 1 host MachineD pfactor 1 host MachineE pfactor 1 host MachineF pfactor 1 host MachineG pfactor 200 host MachineH pfactor 200 host MachineI pfactor 200 host MachineJ pfactor 200 maxexec 8 host MachineA vmaxexec 0 host MachineB vmaxexec 0 host MachineC vmaxexec 0 host MachineD vmaxexec 0 host MachineE vmaxexec 0 host MachineF vmaxexec 0 host MachineG vmaxexec 2 host MachineH vmaxexec 2 host MachineI vmaxexec 2 host MachineJ vmaxexec 2 ------------------------------------------------------------- -- Paul Sargent mailto: [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]