> So I was wondering what the current "state of the art" is in  
> clustered 3D FFTs?  I've googled around a bit, but most off the  
> results seem a little dated.  If someone could point me to any recent
 
> papers or studies, I would be grateful.

 You can find some of the reasonably recent FFT related stuff at this
link (read the preprint on the FFT strategies):

 http://pages.unibas.ch/comphys/comphys/SOFTWARE/

 Anyway, the "alltoall" can be a real killer. If you want to use lots
of cpus with really small packets, go for something like Parastation
MPI ( http://www.parastation.com/ ). This MPI package is MPICH based
and cuts down latencies for small packets by about 30% (really !). And
the best part it is free for academics.

 For large packets, things get trickier. Like on a dual Opteron cluster
around here there is significant "choking" effect, due to unknown
reasons. Using skampi 4.1, one gets what is shown below for 64kB
packets (this is with the bleeding edge version of Open-MPI, 1.1
pre-alpha). Open-MPI developers promise to pay specific attention to
the "alltoall" function, so things might become quite good at some
point.


[ncpu ms std]

(choking at 15 cpus)
#/[EMAIL PROTECTED]/
       2     275.1      1.6      8     275.1      1.6      8
       3    1890.2     31.3      8    1890.2     31.3      8
       4    3467.1     85.0      8    3467.1     85.0      8
       5    5843.9     66.3      8    5843.9     66.3      8
       6    8720.9    110.6      8    8720.9    110.6      8
       7    9598.8     99.6      7    9598.8     99.6      7
       8   11757.9    256.4      6   11757.9    256.4      6
       9   13428.2    166.4      8   13428.2    166.4      8
      10   14623.4    176.2      8   14623.4    176.2      8
      11   16689.4    171.9      4   16689.4    171.9      4
      12   18941.4    502.9      5   18941.4    502.9      5
      13   20105.2     99.0      8   20105.2     99.0      8
      14   22731.1    155.0      2   22731.1    155.0      2
      15  123939.7  49248.4      8  123939.7  49248.4      8
      16  142048.0  43888.8      8  142048.0  43888.8      8

 If "alltoall" is not used, but rather a bunch of isend+irecv, the
choking effect shows up way earlier:

(choking at 6 cpus)
#/[EMAIL PROTECTED]/
       2     247.4      0.8      8     247.4      0.8      8
       3    1861.8     10.1      8    1861.8     10.1      8
       4    3158.4     24.5      8    3158.4     24.5      8
       5    4270.0     75.0      2    4270.0     75.0      2
       6  225351.5  12504.5      2  225351.5  12504.5      2
       7  228399.5  14770.5      2  228399.5  14770.5      2
       8  247087.5  14448.4      2  247087.5  14448.4      2
       9  243806.7   3878.9      8  243806.7   3878.9      8
      10  248353.0   6640.9      2  248353.0   6640.9      2
      11  267541.5   5210.1      8  267541.5   5210.1      8
      12  286600.1   1665.1      2  286600.1   1665.1      2
      13  277546.5   4208.1      8  277546.5   4208.1      8
      14  364208.9  98276.9      2  364208.9  98276.9      2
      15  392139.0 101163.9      2  392139.0 101163.9      2
      16  367182.1  97711.0      2  367182.1  97711.0      2

  Konstantin


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to