Hi,
Am 26.03.2006 um 21:07 schrieb James Rustad:
Guys
This is a strange question, but
Is there any way to disable a bad node in PBS without being the
system administrator?
I am lining up about 50 jobs in the queue and they fail
sequentially when they hit
the bad node. This often seems to happen on the weekends when nobody
is around to reboot the node.
Can I specify within PBS "don't use node015" or something like that.
Thanks
Jim Rustad
ps
I may be using TORQUE rather than PBS, by the way
although I can't answer your question directly: what is causing this
black hole in the cluster? I faced this with a filled /tmp on some
nodes from time to time. As we are using SGE, I use their load-sensor
facility to check the free space there and put the node into alarm-
state otherwise, i.e. disabling the queues on this node. Maybe
something similar could be implemented also with Torque, to get some
self-healing at weekends. - Reuti
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf