On Aug 26, 2008, at 10:34 PM, Tim Cutts wrote:


On 26 Aug 2008, at 2:29 pm, Perry E. Metzger wrote:

I think part of the issue is that most people doing scientific
computing don't have computer science backgrounds, which is a
shame.

There is an unwritten recruitment rule, certainly in my field of science, that the programmer "must understand the science", and actually being able to write good code is very much a secondary requirement.

I couldn't disagree more.

Maybe your judgement is not objective.

For any serious software, let's be objective. Only a few will learn how to program real well and manage to find their way in complex codes. That's for just very few. That usually and not seldom takes a year or 10 to learn. If someone has a PHD or Master or whatever in some science,
he's usually capable of explaining and understanding things.

Becoming a very good low level programmer is a lot harder than learning a few more algorithms that can solve a specific problem. Especially understanding how to program efficiently parallel is not so easy. I spoke with a guy who figured past week some stuff that was used in the 90s at supercomputers, and he concluded it was very inefficient.

That's *professors* in computer science and math who were involved in that.

A single good low level programmer can speedup things not seldom factor 50.

In fact i remember a statement of a programmer who was hired in germany to do some physics works and after a few years he managed a speedup of a factor 1000+ over the original software. In fact it's more than factor 1000, it was an exponential speedup.

According to his opinion: "getting a speedup less than a factor 1000 in scientific number crunching software you can do with your eyes closed".

Knowing everything about efficient caching and hashing and how to divide that over the nodes without getting the full latency, nor losing factor 50+ to just MPI messaging, that's just simply a fulltime expertise in itself, and there is far fewer you can find who can do that,
than the huge amount of people who can explain you the field's stuff.

Note bio-informatics is a bad idea to mention, it's eating a grand total of < 0.5% system time at supercomputers and that's already system time that hardly gets used in an efficient manner. There is just not much to calculate there, when compared to math, physics and everything that has to do with the weather from climate in X years from now to earthquake prediction.

Physics in itself is eating 50%+ of all supercomputer time.

I think this grew out of the last 20 years of exponentially increasing computer power which meant that in many fields you could write crappy code and just wait for hardware improvements to make it faster. This is particularly the case in fields such as bioinformatics where the field came into existence since the days of very limited memory and very slow machines, so they never experienced the world when writing tight code was essential (I started programming in 1984, so I can barely remember those days either). This is further hindered by the fact that no-one doing a masters in Bioinformatics learns a compiled language. They learn things like Java, R, perl, python and ruby.

There is light at the end of the tunnel, though. I'm beginning to see signs that people are starting to be hired primarily as programmers, and not scientists. This is usually in areas where the scientists have hit a brick wall in terms of performance, and with exponentially increasing data quantities, had nowhere else to go. I expect this to gradually expand over the next couple of years, but there's going to be a lot of pain in the meantime - particularly for those of us building and running the systems, who will tend to get the blame when we supply a 10,000 core cluster and the scientists find their code doesn't run any faster than it does on the current 1,000 core system. "It's a more powerful system, it must be your fault it's not working"

Tim


--
The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE._______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to