Re: [Beowulf] Does computation threaten the scientific method?

Brian Dobbins Thu, 29 Mar 2012 09:23:54 -0700

To borrow from an old joke, I'd say the short answer is "No.", and thelong answer? "Nooooooooooo."

Reproducibility is an interesting issue - on the surface, it seems likea binary thing: something is or is not reproducible. In reality,though, things are almost never duplicated exactly, and there existssome fuzzy threshold at which point things are considered good enough tobe a reproduction. I can go down to a local store and buy a print ofthe Mona Lisa and, to me, it might be a really great reproduction, yeteven writing that sentence has some art critic screaming in agony.Similarly, in computing, if I run some model on two different systemsand get two different results, that can either be indicative of apotential issue or it can be completely fine, because those differencesare below a certain threshold and thus the runs were, in scientificterms, 'reproducible' with respect to each other.

On a small scale (meaning a lab, code or project), this is a key issue -I've seen grad students and faculty alike be dismayed by trivialdifferences, and when this happens, more often than not the mentalityis, "My first results are correct - make this code give them back tome", without understanding that the later, different results are quitepossibly equally valid, and possibly more so. Back in the early Beowulfdays, I remember switching some codes from an RS/6000 platform to anx86-based one, and the internal precision of the x86 FPU was 80-bits,not 64, so sequences of FP math could produce small differences unlessthis option was specifically disabled via compiler switches. Which alot of people did, not because the situation was carefully considered,but because with it on, it gave 'wrong' results. Another examplewould be an algorithm that was orders of magnitude faster than onepreviously in use, but wasn't adopted because ultimately the resultswere different. The catch here? Reordering the input data while stillusing the original algorithm gave similarly different answers - thenature of the code was that single runs were useless, and ensemble runswere a necessity.

Ultimately, the issues here come down to the common perception ofcomputers - "They give you THE answer!" - versus the reality ofcomputers - "They give you AN answer!", with the latter requiringadditional effort to provide some error margin or statistical analysisof results. This happens in certain computational disciplines far moreoften than others.

On the larger scales - whether reproducibility is an issue in scientific/fields/ - again, I'd say the answer is no. The scientific method isresilient, but it never made any claims to be 'fast'. Would it speedthings up to have researchers publish their code and data? Probably.Or, rather, it'd certainly speed up the verification of results, but itmight also inhibit new approaches to doing the same thing. Some peoplehere might recall Michael Abrash's "Graphics Programming Black Book",which had a wonderful passage where about a word-counting program. Itfocused explicitly on performance tuning, with the key lesson being thatnobody thought there was a better way of doing the task... until someoneshowed there was. And that lead to a flurry of new ideas. Similarly,having software that does things in a certain way often convinces peoplethat that is THE way of doing things, whereas if they knew it could bedone but not how, newer methods might develop. There's probably somehappy medium here, since having so many different codes, mostly with asingle author who isn't a software developer by training, seems lessefficient and flexible than a large code with good documentation, a goodcommunity and the ability to use many of those methods previously in theone-off codes.

In other words, we can probably do better, but science itself isn'tthreatened by the inefficiency in verifying results, or even bad results- in the absolute worst case, with incorrect ideas being laid down asthe foundation for new science and no checking done on them, progresswill happen until it can't... at which point people will backtrack untilthe discover the underlying principle they thought was correct and willfix it. The scientific method is a bit like a game of chutes andladders in this respect.

Ultimately, in a lot of ways, I think computational science has itbetter than other disciplines. There was news earlier this week [1]about problems reproducing some early-stage cancer research -specifically, Amgen tried to reproduce 53 'landmark' conclusions, andwere only able to do so with 11% of them. Again, that's OK - it willcorrect itself, albeit in slow fashion, but what's interesting here isthat these sorts of experiments, especially those involving mice (andoften other wet-lab methods), don't have something like Moore's Lawmaking them more accessible over time. To reproduce a study involvingthe immune system of a mouse, I need mice. And I need to wait theproper number of days. Yet with computational science, what today maytake a top end supercomputer can probably be done in a few years on adepartmental cluster. A few years after that? Maybe a workstation. Inour field, data doesn't really change or degrade over time and theability to analyze it in countless different ways becomes more and moreaccessible all the time.

In short (hah, nothing about this was short!), can we do better with ourscientific approaches? Probably. But is the scientific methodthreatened by computation? Nooooooooo. :-)


That's my two cents,
  - Brian

[1]http://vitals.msnbc.msn.com/_news/2012/03/28/10905933-rethinking-how-we-confront-cancer-bad-science-and-risk-reductionOr, more directly (if you have access to Nature) :http://www.nature.com/nature/journal/v483/n7391/full/483531a.html

(PS. The one thing which can threaten science is a lack of education -it can decrease the signal-to-noise ratio of 'good' science, amongstother things. That's a whole essay in itself.)(PPS. This was a long answer, and yet not nearly long enough... but Ididn't want to be de-invited from future Beowulf Bashes by writing evenmore!)



On 3/29/2012 7:58 AM, Douglas Eadline wrote:


I am glad some one is talking about this. I have wondered
about this myself, but never had a chance to look into it.


http://www.isgtw.org/feature/does-computation-threaten-scientific-method

_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Does computation threaten the scientific method?

Reply via email to