Mark Hahn wrote: >>>> desktop (32 bit PCI) cards. I managed to get 14.6 HPL GFLOPS >>>> and 4.35 GROMACS GFLOPS out of 8 nodes consisting of hardware >>> ... >>>> As a point of reference, a quad opteron 270 (2GHz) reported >>>> 4.31 GROMACS GFLOPS. >>> that's perplexing to me, since the first cluster has semp/2500's, >>> right? that's a 1.75 GHz K8 core with 128K L2 and 64b memory >>> interface. versus the same number of 2.0 GHz, 1M cores each with >>> 4x 128b memory. I really wouldn't expect them to be that close - >>> any speculation on why GROMACS runs so poorly on the much better >>> SMP machine? >> <googling for motherboard specs> >> Aha, Socket 462. The Semprons he used are K7 based. > > OK, even more so - how does an even older cpu with lower clock, > slower memory and only gigabit interconnect beat a quad-opt. > it seems like some other factors were determining performance. Well, each Opteron core would have to split it's local memory pool with it's sister, so pure bandwidth would be similar. The memory controller on the Opteron would give a latency bonus, but the registered DIMMs would incur a penalty. The Socket A motherboards are using an SIS chipset which might be a little more tuned.
If the application largely factored out the interconnect, I could accept the results being this close. But you're right. HT is so much better for inter-process communication, and GROMACS should derive a big advantage from it. -- Geoffrey D. Jacobs Go to the Chinese Restaurant, Order the Special _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf