Hi, Ivan. I'm a nay-sayer in this kind of scenario. I believe your staff time, and the time of your lab users, is too valuable to spend on dual-classing desktop lab machines.
If your lab is underutilized, I would spend staff time on figuring out why and how to make the lab more effective as a destination for prospective users. If you need more cluster compute time, I would invest funds into additional compute nodes, not into micromanaging the lab machines. Skilled sysadmin time is valuable. Let's also consider the cost of electricity and cooling. I doubt that the lab machines and climate control are the most efficient in terms of full-throttle HPC/HTC computing. Electricity and cooling should be at the top of your list for cost effective and green computing. I would instead have the lab machines suspend/sleep until they require automated patching or desktop login. Also, let's consider the user experience. Cluster users will see jobs killed and restarted; they will not be happy. Lab users will see slow and/or hung machines; they will stop coming to the lab. Don't get me wrong, this is an interesting project, but one riddled with pitfalls. If the job is to support a computing lab, that should be goal number one. Cheers. On Thu, Sep 26, 2013 at 10:00:28AM -0300, Ivan M wrote: > Hi folks, > > I have access to a bunch (around 20) machines in our lab, each one with a > particular configuration, usually some combination of Core i5/i7 and > 4GB/8GB/16GB RAM (the "heterogeneous" part), connected by a 24 ports Cisco > switch with reasonable backplane. They're end user machines, but with the > current lab occupation only a fraction of them are used constantly, but > which ones change every day. They are all running Debian stable. I got an > idea: why not use the downtime to run some parallel simulations, instead of > using the university cluster? > > They main problems now are: > > 1) System administration: for now I'm doing the clusterssh way to > update/configure/install new software, but this can be very cumbersome, as > one of the machines can be being used and so I can't change its > configuration, so I have to keep track of which ones have changed. Maybe > puppet can help here? > > 2) Managing resources: knowing which machine is up and available withou > having to shout, and knowing the available configuration to allocate jobs > that can fit in that particular machine, etc. There are extreme cases when > the machine needs to be rebooted to run some Windows program. > > 3) Migrating jobs (the intermitent part): any machine can be requested by a > user at any time, so if I have a parallel job running I would have to > migrate the job to another machine, preferably without stopping the other > jobs. We are running mostly ROMS over MPI and some in-house simulations > that use a combination of OpenMP and MPI. > > Does anyone have any experience or pointers on how to address these issues? > It seems a waste not to use those idle machines... > > Ivan Marinhttp://scholar.google.com.br/citations?user=faM0PCYAAAAJ > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf -- Gavin W. Burris Senior IT Project Leader Research Computing Wharton Computing University of Pennsylvania _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf