On Tue, Feb 28, 2006 at 01:26:51PM -0500, Bill Rankin wrote: > There is a research group here at Duke doing some application > development and they are looking at implementing their codes in a > cluster environment. The main problem is that 95% of their > processing time is taken up by medium to large sized 3D FFTs (minimum > 64 elements on an edge, 256k total elements).
That's a fairly small FFT on a parallel cluster. How many cpus do they imagine using? Perhaps the easiest thing to do is to whip up some code and invite people to benchmark it. The G-PTRANS and G-FFTE elements of HPC Challenge are relevant but not many folks have submitted numbers. Let's see: for 64**3, and 64 cpus with a 1D decomposition, there are 64**2 words per cpu, and a naive Alltoall will send 64 messages of 64 words each to 63 other nodes. Then the message length is 1024 bytes (double precision complex). I would disagree with Stu's recommendations at this size due to the short message length, but I don't know if 2D would be a better decomposition at this size. FFTW version 2's MPI routines only do 1D decomposition. -- greg _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf