Re: [Beowulf] Multidimensional FFTs

Stuart Midgley Tue, 28 Feb 2006 17:22:57 -0800

Hi Bill

I've tested fft's rather extensively and run other codes that requirea transpose. In my experience, a well tuned gig-e network is capableof giving speed up, though not necessarily scaling that well. Themost important thing is that you have full bisection bandwidth.Anything less will reduce your scaling. That is, if you use gig-eyou can't trunk switches, you will need to stay within a singleswitch. Typically, I've seen a 16 cpu job on gig-e gig about a 10times speedup. Of course, it is processor/memory/nic dependant.

I've also run fft's on Quadrics Elan 3/4, IBM hps, and SGI Numalink4. Since these are considerably higher bandwidth network theyperform much better. On a 16cpu job I've seen around 14 times speedup on these higher bandwidth networks.

As the size increases (say 256 cpu's) the networks that maintain fullbisection bandwidth scale the best. There are very few reasonablyprices gig-e switches that maintain full bisection bandwidth at 256cpu's, while Quadrics and HPS do (though their starting price ishigh, at the larger system sizes, they become a realisticproposition). Numalink falls away a little due to the weird networktopology (dual plane quad bristle fat tree) which has drops innetwork connectivity/cpu as the system gets larger.


If you want to go with gig-e a few things to be aware of:

*The nic matters (pro1000MT's give 10-15% better performance thatpro1000T's)


* Go with single cpu nodes - higher per cpu network bandwidth

* If you get dual core cpu's, treat it as a single core node (allowthe 2nd core to do all the tcp stuff)

I've played around with multiply connected nodes (nodes that havedual ported nics) and the 2'nd nic doesn't give you much (10-15%) andrequires a fair bit of stuffing around to get it working well. Ithink you would be better of running your global fs and otherservices over 1 nic and your mpi traffic over the other. At leastthis way, your fs and services shouldn't be stealing your bandwidth.

You may even try running mpi-gamma on the 2nd nic, which should giveyou better bandwidth, hence better scaling (I haven't tried this).


If you want real measured numbers, drop me a personal email.

Stu.



On 01/03/2006, at 2:26, Bill Rankin wrote:

Hey gang,
I know that in the past, multidimensional FFTs (in my case, 3D)have posed a big challenge in getting them running well onclusters, mainly in the areas of scalability. This is somewhat dueto the need for an All2All communication step in the processing(although there seem to be some alternative approaches here).
There is a research group here at Duke doing some applicationdevelopment and they are looking at implementing their codes in acluster environment. The main problem is that 95% of theirprocessing time is taken up by medium to large sized 3D FFTs(minimum 64 elements on an edge, 256k total elements).
So I was wondering what the current "state of the art" is inclustered 3D FFTs? I've googled around a bit, but most off theresults seem a little dated. If someone could point me to anyrecent papers or studies, I would be grateful.
Some specifics that I am interested in would be a good comparisonof different interconnects on overall performance, as this willhave a significant impact on the design of their cluster.
Thanks,

-bill


--
Dr Stuart Midgley
[EMAIL PROTECTED]


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Multidimensional FFTs

Reply via email to