On Wed, 25 Apr 2007, Ashley Pittman wrote: > I'm not sure I follow, surely using PIO over DMA is a lose-lose > scenario? As you say conventional wisdom is offload should win in this > situation... > > Mind you we do something completely different for larger messages which > rules out the use of PIO entirely.
Sorry -- I mean that in line with the goal of spending the least amount of time in MPI, there's no obvious answer for protocol breaks and send mechanisms. > I'm sure I've seen a benchmark like this before, something that measured > the latency of messages and then sees how much "work" can be done before > latency increases, in effect measuring the CPU overhead of a send. > Quadrics tends to look good when these figures are presented as absolute > numbers and bad when presented as % of latency by virtue of having lower > latency to start with. I was recently asked to improve the percentage > figure and the best I could come up with was to put a sleep(1) on the > critical path. I'm not sure if it is or not but if it is the GASNet > benchmark I'm thinking of could you change the way it reports results > please? Funny that you mention percentage of time because the same argument applies with using PIO sends for a "largish" message. In relative terms of cpu availability it doesn't look that good but in terms of absolute time spent in the send it's not all that bad. GASNet should compile out-of-the-box -- in fact, I think Dan has it to point where you can just do './configure && gmake run-tests' and it will compile, link and submit tests with prun (and run testqueue). None of the GASNet tests have relative figures in them, you must be thinking of another test or suite of tests. If absolute numbers in microbenchmarks require scrutiny as to methodology used, i think it's even more so with relative numbers. If relative metrics are shown, absolute metrics *must* be shown since relative numbers can be used to effectively remove elapsed time and the true cost of an operation. I.e. This optimization caused a 30% speedup on NAS CG/FT/EP.. on 256 processors (and the graphs/results fail to mention that the smallest problem size class A is being used). You'd think this is obvious but you can find these types of omissions in published material. . . christian -- [EMAIL PROTECTED] (QLogic SIG, formerly Pathscale) _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf