On 08/20/2013 02:03 PM, Mark Hahn wrote: >> 1. Many people's job scripts use ssh either directly (to say clean up /tmp) >> or indirectly from mpirun. > sure. > >> (good mpirun's use the batch engine's per-node >> daemon to launch the binaries not ssh). > why should a scheduler have daemons cluttering up compute nodes? > also, do you really launch so many very big but very short jobs > that startup time is a serious concern? I'm questioning assumptions here. >
It's not about reducing startup time. It's about controlling the jobs accurately. With SGE, all jobs started are children of the sge_sheperd processes (which gets started by the sge_execd daemon). This allows SGE to have ultimate control over the jobs (suspend resume, kill, etc.) and provide accurate process accounting. When you just use SSH, can you provide that level of control? And if you really want to be a pro, you create a prolog script that creates a unique temp directory for the user, that they can reference with an environment variable (say, SCRATCH, or SGE_TMP or something similar), that they can use while their job is running, and then have an epilog script that will automatically clean it up after the job completes. If you're users must clean up /tmp themselves after jobs are done, something is not right. Prentice _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf