In those famous words from "Cool Hand Luke," "What we have here is a
failure to communicate." For my role in that failure I apologize.
Tony Travis wrote:
I think problems can occur when you enforce such a strict demarcation
boundary between your role and the role of the scientists you support:
If communication between you and the scientists breaks down and you do
not understand what they want to do then you cannot support their work
effectively. The bottom line for me is that it is the objective of the
organisation as a whole to produce research, and your role within the
organisation is to facilitate the work of the scientists who do it.
Agreed. The point that I'm making is that managing a resource for one
individual or group is different than managing it for multiple
individuals or groups. It is my role within the organization to support
the work of ALL of the scientists. That means that for instance that we
use a batch system. It means that we have limits on the number of jobs
anyone can submit. It does not mean that I don't listen to what other
people need.
I'm a scientist with a Linux box at my house: I also built and manage a
small (64-1p node) openMosix Beowulf cluster for bioinformatics work at
RRI and for the bioinformatics/mathematical work of our sister
organisation BioSS. I don't think I'm exceptional in doing this, but I
do think that having a Linux box at home has been very useful to me in
gaining the experience I needed to manage our Beowulf cluster.
Not 'everyone' like me is as stupid or naive as you imply. I have the
support of an excellent IT department and an electronics workshop who
talk to me and understand very well what I want to do with the Beowulf.
We have about 400 user accounts, which are registered and managed by IT
centrally. I just enable NIS. The IT department also manage the central
filers where precious data files are stored. I manage 3.2 TB of local
RAID on the Beowulf. In my opinion this type of cooperation is a lot
more effective than strict job demarcation...
For the record, I implied no stupidity and no naivete. I don't manage a
machine for an individual or even a department. I manage it for the
institution. I work with individuals to meet their needs. Meeting these
needs has helped us to grow our resources from 10's of processors to
hundreds. It has provided researchers the resources to win millions of
dollars in grants that we couldn't have competed for without the
cooperation that we've built.
Seems to me that it would be straight-forward to know this if you use a
package management system like apt or rpm, which keeps track of what's
installed and what the dependencies are. However, I also think that it's
quite right that you should know more about this than him. In an ideal
world, you should both make the decision about what to do on a rational
basis. I doubt that he asked you to do it for no reason at all.
I don't work with much scientific software that is available in rpm
form. Some of it is binary. But most is compiled specifically for our
machines using high performance compilers from absoft, intel, pathscale
or the Portland Group. So apt and rpm don't solve the problem. After
discussing the issue with the PI, we discovered that she didn't need the
most recent version (as I think that I noted in my original reply). The
version that was installed would work.
Most of the problems I've come accross like this arise from a lack of
communication. I believe it's quite important for you to know why he
wanted to do the upgrade, and for you to inform him about any problems
or conflicts of interest that would result from the upgrade. Presumably,
that is exactly what you did. My only complaint here is the impression
you give that scientists like me want to upgrade software just for the
sake of doing it. Please ask yourself why did the upstream maintainers
release a new version? Was it just for the sake of upgrading it?
The person who wanted the upgrade was not the PI it was a member of our
staff. When I asked him to research the needed changes and talked with
the PI, the upgrade was not necessary.
I do advocate upgrading unless there is a reason *not* to
do it. You seem to recommend the opposite of not upgrading unless there
*is* a reason to do it. I wonder which strategy results in less work?
Finally, in general, the uptime of my clusters are measured in years. My
originial cluster purchased from Paralogic ran with downtime of less
than 3 hours in 5 years. I think that works out to 99.9999% uptime.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf