Hi,

Am 26.03.2006 um 21:07 schrieb James Rustad:

Guys
This is a strange question, but
Is there any way to disable a bad node in PBS without being the system administrator? I am lining up about 50 jobs in the queue and they fail sequentially when they hit
the bad node.  This often seems to happen on the weekends when nobody
is around to reboot the node.

Can I specify within PBS "don't use node015" or something like that.
Thanks
Jim Rustad
ps
I may be using TORQUE rather than PBS, by the way

although I can't answer your question directly: what is causing this black hole in the cluster? I faced this with a filled /tmp on some nodes from time to time. As we are using SGE, I use their load-sensor facility to check the free space there and put the node into alarm- state otherwise, i.e. disabling the queues on this node. Maybe something similar could be implemented also with Torque, to get some self-healing at weekends. - Reuti


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to