Agree completely and let's remember that the best algorithm is directly related to the hardware platform that it is implemented on. I have worked with an algorithm specialist who has a unique understanding on how code gets converted into the electrons, that executes the code. This relationship is becoming more apparent as companies integrate FPGA technology into their solutions, and let's not forget RAM disc, interconnect, storage technologies, and others. A person who has mastered this relationship is truly a rare bird. But that is the academic side of the question, the practical side is the cost performance metric (which includes the profit motive in most cases), and more and more it is easier to through low cost hardware on the solution as opposed to the more elegant solution, at least at the macro level. When you get into the top sites, however, this seems to go completely in the reverse as the National Labs, etc, need the more elegant solution to solve the 'grand challenge' problems.
I've also know some software guys who think they are one step from the 'throne' just because they can write some code and it executes. Bill Harman, P - (801) 572-9252 F - (801) 571-4927 [EMAIL PROTECTED] ------ High Performance Computing Solutions ------ -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Toon Knapen Sent: Friday, September 15, 2006 8:05 AM To: Patrick Geoffray Cc: 'Beowulf List'; Mark Hahn Subject: Re: [Beowulf] cluster softwares supporting parallel CFD computing I agree that in general the quality of the parallelism in most codes is rather low, unfortunately. But it is hard to proof that much can be gained when the quality would be improved. Let me elaborate. When developing an app. that needs to run fast, one first needs to look at using the best algorithm to get the job done. While implementing the algorithm, attention must be paid to the app being stable (no use in having a fast app which crashes the whole time). And finally you start optimizing. But while using a better algorithm might give you a 50% boost, performance increases due to code-optimization are generally only marginal. Basically, changes early in the development process will have a big effect on performance while changes late in the dev.process. will have minor effects. For instance, I wonder if any real-life application got a 50% boost by just changing the switch (and the corresponding MPI implementation). Or, what is exactly the speedup observed by switching from switch A to switch B on a real-life application? toon Patrick Geoffray wrote: > Hi Mark, > > Mark Hahn wrote: >> all these points are accurate to some degree, but give a sad >> impression of the typical MPI programmer. how many MPI programmers >> are professionals, rather than profs or grad students just trying to >> finish a calculation? >> I don't know, since I only see the academic side. > > I think that the sample of MPI codes or traces that I have seen so far > is a good representation of the academic, labs and commercial sides. > It's pretty bad. I am sure they are many reasons, but a few come to mind: > > * a lot of codes in academia and at the labs are written directly by > the scientist, physicist, chemist, whatever. They are expert in their > domain, but they don't know how to write good code. Doesn't matter if > it's parallel or sequential, they don't know how to do it right. For > their defense, they never really learned, and they are doing the best > they can. However, they really should work with professional > programmers. It's paradoxical that physicists would use the service of > a statistician to help them make sense of their experimental data, but > they don't want help for computer science. > It's interesting to note that there has always been this push from > high in the food chain to bypass the human computer science expertise: > it was automagic compilers (OpenMP, HPF and family) in the past, it's > "high-productivity" languages now. > > * In the commercial side, the codes are quite old, at least in their > design. You can see traces of port from SHMEM to MPI, with Barriers > a-lot-and-often. You see collective communications done by hand, I > guess because the implementation of the collectives sucked at the > time. You see an shameful amount of unexpected messages, the kind > where the receive is just a little too late, typical from a code that > was designed for a slow network, relatively. In short, it looks like > they minimize the investment in code maintenance. > > >> for academics, time-to-publish is the main criterion, which doesn't >> necessarily mean well-designed or tuned code. taking a significant > > I don't know if time is really the constraint here. For grads > students, sure, but I would not think that more time would help with > profs. A good programing book maybe, but they are too proud to read > those :-) > > Patrick -- Toon Knapen ------------------------------------------------ Check out our training program on acoustics and register on-line at http://www.fft.be/?id=35 _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf