On 10/21/2011 11:24 AM, Reuti wrote: > Hi, > > Am 21.10.2011 um 15:10 schrieb Prentice Bisbal: > >> Beowulfers, >> >> I have a question that isn't directly related to clusters, but I suspect >> it's an issue many of you are dealing with are dealt with: users using >> the screen command to stay logged in on systems and running long jobs >> that they forget about. Have any of you experienced this, and how did >> you deal with it? >> >> Here's my scenario: >> >> In addition to my cluster, we have a bunch of "computer servers" where >> users can run the programs. These are "large" boxes with more cores >> (24-32 cores) and more RAM (128 - 256 GB, ECC) than they'd have on a >> desktop top. >> >> Periodically, when I have to shutdown/reboot a system for maintenance, >> I find a LOT of shells being run through the screen command for users >> who aren't logged in. The majority are idle shells, but many are running >> jobs, that seem to be forgotten about. For example, I recently found >> some jobs running since July or August that were running under the >> account of someone who hasn't even been here for months! >> >> My opinion is these these are shared resources, and if you aren't >> interactively using them, you should log out to free up resources for >> others. If you have a job that can be run non-interactively, you should >> submit it to the cluster. >> >> Has anyone else here dealt with the problem? >> >> I would like to remove screen from my environment entirely to prevent >> this. My fellow sysadmins here agree. I'm expecting massive backlash >> from the users. > > I disallow rsh to the machines and limit ssh to admin staff. Users who want > to run something on a machine have to go through the queuing system to get > access to a node granted by GridEngine (for the startup method you can use > either the -builtin- or [in case you need X11 forwarding] by a different > sshd_config and ssh [GridEngine will start one daemon per task], one > additional step is necessary for a tight integration of ssh). > > For users just checking their jobs on a node I have a dedicated queue (where > they can login always, but h_cpu limited to 60 seconds, i.e. they can't abuse > it). > > -- Reuti >
Reuti, That was EXACTLY my original plan, but for reasons I don't want to get into, I can't implement that. In fact, just yesterday I ripped out all the SGE queues I had configured to that. Why? because I was tired of seeing them and being reminded of what a good idea it was. :( -- Prentice _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf