Hi,
Am 13.04.2008 um 04:52 schrieb Lance S. Jacobsen:
I recently put together a small cluster of Xeons using CentOS 5.1
x86_64. This cluster is my first real big experience with Linux
and administration. It took some learning and such to install NIS,
NFS, etc., but now the machines seem to be working well, and so I
am working on the next step: installing a que scheduler. I decided
on TORQUE 2.3.0 since its free and I don't know any better. I have
installed this and am having trouble getting it to detect my nodes.
I think the problem is that I named them starting with numbers in
my /etc/hosts file: 1of12 , 2of12, ... 12of12. Instead of something
like node01, node02, ...
After the installation, TORQUE did not create a file called 'nodes'
which it told me that I needed, and so after searching the web I
found the command to create it:
# qmgr -c "create node 2of12"
When I do this it gives me the following reply:
qmgr: syntax error - checklist failed
create node 2of12
/\
If I do this naming my node with a letter in front (n2of12) then it
seems to work and generate the nodes file.
Now if I then go and do the "pbsnodes -a" command it tells me:
n2of12
state = down
np =1
ntype = cluster
seems fine... should be down since there is no n2of12 in my hosts
file.
Now if I then go and rename the node in the node file back to 2of12
and type the following to kill and restart the server:
# qterm
# pbs_server
I get the following reply:
PBS_Server: pbsd_init(setup_nodes), token "2of12" doesn't start
with alpha on line 1.
PBS_Server: PBS_Server, pbsd_init failed
Now I am reluctant to go and change all of my node names (IP
aliases) since everything else about my cluster is finally working
well and so I have been trying to find out why pbsd_init will not
accept host names that start with numbers. Also, I would hate to go
and change this if it is not the problem.
Does anyone know if I might be able to edit the setup files
associated with pbsd_init to get this to work (or any other ways to
do this)?
I wouldn't use in general a digit as first charcter, like it's
outlined here:
http://rfc.net/rfc1178.html page 4.
Some programs might simply check the first character to decide
whether it's a hostname or TCP/IP address. Thinking in long terms and
additional software in your cluster (maybe even parallel apps), I
would suggest to change the names of the machines.
-- Reuti
BTW: Torque has a list on its own at: http://www.clusterresources.com
Thanks,
Lance
--
Lance S. Jacobsen, Ph.D.
President
GoHypersonic Incorporated
714 E. Monument Ave., Suite 201
Dayton, OH 45402-1382
Tel: 937-531-6678
Fax: 937-531-6679
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf