Hi,

Am 13.04.2008 um 04:52 schrieb Lance S. Jacobsen:
I recently put together a small cluster of Xeons using CentOS 5.1 x86_64. This cluster is my first real big experience with Linux and administration. It took some learning and such to install NIS, NFS, etc., but now the machines seem to be working well, and so I am working on the next step: installing a que scheduler. I decided on TORQUE 2.3.0 since its free and I don't know any better. I have installed this and am having trouble getting it to detect my nodes.

I think the problem is that I named them starting with numbers in my /etc/hosts file: 1of12 , 2of12, ... 12of12. Instead of something like node01, node02, ...

After the installation, TORQUE did not create a file called 'nodes' which it told me that I needed, and so after searching the web I found the command to create it:

# qmgr -c "create node 2of12"

When I do this it gives me the following reply:

qmgr: syntax error - checklist failed
create node 2of12
                  /\

If I do this naming my node with a letter in front (n2of12) then it seems to work and generate the nodes file.

Now if I then go and do the "pbsnodes -a" command it tells me:

n2of12

state = down
np =1
ntype = cluster

seems fine... should be down since there is no n2of12 in my hosts file.

Now if I then go and rename the node in the node file back to 2of12 and type the following to kill and restart the server:

# qterm
# pbs_server

I get the following reply:

PBS_Server: pbsd_init(setup_nodes), token "2of12" doesn't start with alpha on line 1.

PBS_Server: PBS_Server, pbsd_init failed

Now I am reluctant to go and change all of my node names (IP aliases) since everything else about my cluster is finally working well and so I have been trying to find out why pbsd_init will not accept host names that start with numbers. Also, I would hate to go and change this if it is not the problem.

Does anyone know if I might be able to edit the setup files associated with pbsd_init to get this to work (or any other ways to do this)?

I wouldn't use in general a digit as first charcter, like it's outlined here:

http://rfc.net/rfc1178.html page 4.

Some programs might simply check the first character to decide whether it's a hostname or TCP/IP address. Thinking in long terms and additional software in your cluster (maybe even parallel apps), I would suggest to change the names of the machines.

-- Reuti

BTW: Torque has a list on its own at: http://www.clusterresources.com


Thanks,

Lance

--
Lance S. Jacobsen, Ph.D.
President
GoHypersonic Incorporated
714 E. Monument Ave., Suite 201
Dayton, OH 45402-1382
Tel: 937-531-6678
Fax: 937-531-6679
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to