Hey Stuart, Thanks for your answer ...
That sounds compelling. May I ask a few more questions? So should I assume that this was a threaded SMP type application (OpenMP, pthreads) or it is MPI based? Is the supporting CPU of the multi-core Sandy Bridge vintage? Have you been able to compare the hyper-threaded, multi-core scaling on that Sandy Bridge side of the system with that on the Phi (fewer cores to compare of course). Using the Intel compilers I assume ... how well do your kernels vectorize? Curious about the observed benefits of hyper-threading, which generally offers little to floating-point intensive HPC computations where functional unit collision is an issue. You said you have 2 Phis per node. Were you running a single job across both? Were the Phis in separate PCIE slots or on the same card (sorry I should know this, but I have just started looking at Phi). If they are on separate cards in separate slots can I assume that I am limited to MPI parallel implementations when using both. Maybe that is more than a few questions ... ;-) ... Regards, Richard Walsh Thrashing River Consulting On Tue, Feb 12, 2013 at 10:46 AM, Dr Stuart Midgley <sdm...@gmail.com>wrote: > It was simple really. Within 1hr, I had recompiled a large amount of our > codes to run on the phi's and then ssh'ed to the Phi and ran them… Saw that > a single phi was faster than our current 4 socket AMD 6276 (64 cores) and > then ordered machines with 2 phi's in them :) > > I didn't bother with any of the compiler directives etc… just treated them > like a 240core (hyper threaded) computer… and saw great scaling. > > > -- > Dr Stuart Midgley > sdm...@sdm900.com > > > > > On 12/02/2013, at 11:12 PM, Richard Walsh <rbwcn...@gmail.com> wrote: > > > > > Hey Stuart, > > > > I am interested in what sold you on the Phi. My cursory look > > suggested that using the Phi in Intel's offload mode (which > > preserves the scalar performance) was not much easier to > > program than writing in CUDA ... and that using the Phi as > > a standalone processor while a programming convenience > > suffers on scalar code. Even that programming convenience > > is limited by the fact that you have to think both in terms of > > vectors and threads. > > > > Also, the speed ups I have seen generally seem modest, > > understanding that GPU performance hype is exaggerated. > > > > Hearing what you like would be interesting. > > > > Thanks, > > > > > > Richard Walsh > > Thrashing River Consulting > > > > On Tue, Feb 12, 2013 at 10:02 AM, Dr Stuart Midgley <sdm...@gmail.com> > wrote: > > I've started a blog to document the process I'm going through to get our > Phi's going. > > > > http://phi-musings.blogspot.com.au > > > > Its very sparse at the moment, but will get filled in a lot over the > next day or so… I've finally got them booting. > > > > FYI we currently have 100 co-processors and should have the next 160 or > so in a few weeks. > > > > > > -- > > Dr Stuart Midgley > > sdm...@sdm900.com > > > > > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > > >
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf