Re: [Beowulf] difference between accelerators and co-processors

Lux, Jim (337C) Tue, 12 Mar 2013 10:05:55 -0700

#3, below, to me, means "accelerator".. fling data over the wall to the 
accelerator, go do something else, when the answer comes back, use it.
And this makes sense for computationally BIG tasks  with O(n^2 or n^3 or worse 
exp(n)) (say, inverting a matrix.. )


Jim Lux

From: beowulf-boun...@beowulf.org [mailto:beowulf-boun...@beowulf.org] On 
Behalf Of Brendan Moloney
Sent: Monday, March 11, 2013 9:02 PM
To: Joshua mora acosta
Cc: Beowulf List; Mark Hahn
Subject: Re: [Beowulf] difference between accelerators and co-processors

I think this analysis is missing some important points.
1) Comparing a single low power APU to a single high power discrete GPU doesn't 
make sense for HPC. Rather we should compare a rack of equipment that can 
operate in the same power envelope.

2) You can bolt GDDR5 onto an APU, eliminating the local bandwidth advantage 
(AMD is doing exactly this for the PS4). Also, we should really be comparing 
the bandwidth available to each GPU "core".
3) Almost every GPGPU research paper devotes significant space (perhaps the 
whole paper) to figuring out ways of doing some step in their algorithm, that 
is trivial on a CPU, efficiently on a GPU.  Avoiding round trips is a driving 
force in most algorithm development. So the programming should be easier, even 
if you still need (for now) the explicit API calls for memory "transfers".

On Sun, Mar 10, 2013 at 5:55 PM, Joshua mora acosta 
<joshua_m...@usa.net<mailto:joshua_m...@usa.net>> wrote:
See this paper
http://synergy.cs.vt.edu/pubs/papers/daga-saahpc11-apu-efficacy.pdf

While discrete GPUs underperform wrt APU on host to/from device transfers in a
ratio of ~2X, it compensates by far the computing power and local bandwidth
~8-10X.

You can cook though a test where you do little computation and it is all bound
by the host to/from device transfers.

Programming wise there is no difference as there isn't yet coherence so
explicit transfers through API calls are needed.

Joshua

------ Original Message ------
Received: 04:06 PM CDT, 03/10/2013
From: Vincent Diepeveen <d...@xs4all.nl<mailto:d...@xs4all.nl>>
To: Mark Hahn <h...@mcmaster.ca<mailto:h...@mcmaster.ca>>Cc: Beowulf List 
<beowulf@beowulf.org<mailto:beowulf@beowulf.org>>
Subject: Re: [Beowulf] difference between accelerators and co-processors

>
> On Mar 10, 2013, at 9:03 PM, Mark Hahn wrote:
>
> >> Is there any line/point to make distinction between accelerators and
> >> co-processors (that are used in conjunction with the primary CPU
> >> to boost
> >> up the performance)? or these terms can be used interchangeably?
> >
> > IMO, a coprocessor executes the same instruction stream as the
> > "primary" processor.  this was the case with the x87, for instance,
> > though the distinction became less significant once the x87 came
> > onchip.
> > (though you certainly notice that FPU on any of these chips is mostly
> > separate - not sharing functional units or register files,
> > sometimes even
> > with separate micro-op schedulers.)
> >
> >> Specifically, the word "accelerator" is used commonly with GPU. On
> >> the
> >> other hand  the word "co-processors" is used commonly with Xeon Phi.
> >
> > I don't think it is a useful distinction: both are basiclly
> > independent
> > computers.  obviously, the programming model of Phi is dramatically
> > more
> > like a conventional processor than Nvidia.
> >
>
> Mark, that's the marketing talk about Xeon Phi.
>
> It's surprisingly the same of course except for the cache coherency;
> big vector processors.
>
> > there is a meaningful distinction between offload and coprocessor
> > approaches.
> > that is, offload means you use the device to accelerate a set of
> > libraries
> > (offload matrix multiply, eig, fft, etc).  to use a coprocessor, I
> > think the
> > expectation is that the main code will be very much aware of the
> > state of the
> > PCIe-attached hardware.
> >
> > I suppose one might suggest that "accelerator" to some extent implies
> > offload usage: you're accelerating a library.
> >
> > another interesting example is AMD's upcoming HSA concept: since
> > nearly all
> > GPUs are now on-chip, AMD wants to integrate the CPU and GPU
> > programming
> > models (at least to some extent).  as far as I understand it, HSA
> > is based
> > on introducing a quite general intermediate ISA that can be
> > executed using
> > all available hardware resources: CPU and/or GPU.  although Nvidia
> > does have
> > its own intermediate ISA, they don't seem to be trying to make it
> > general,
> > *and* they don't seem interested in making it work on both C/GPU.
> > (well,
> > so far at least - I wouldn't be surprised if they _did_ have a PTX
> > JIT for
> > their ARM-based C/GPU chips...)
> >
> > I think HSA is potentially interesting for HPC, too.
> >   I really expect
> > AMD and/or Intel to ship products this year that have a C/GPU chip
> > mounted on
> > the same interposer as some high-bandwidth ram.
>
> How can an integrated gpu outperform a gpgpu card?
>
> Something like what is it 25 watt versus 250 watt, what will be faster?
>
> I assume you will not build 10 nodes with 10 cpu's with integrated
> gpu in order to rival a
> single card.
>
> >   a fixed amount of very high
> > performance memory sounds very tasty to me.  a surprising amount of
> > power
> > in current systems is spend getting high-speed signals off-socket.
> >
> > imagine a package dissipating say 40W containing a, say, 4 CPU cores,
> > 256 GPU ALUs and 2GB of gddr5.  the point would be to tile 32 of them
> > in a 1U box.  (dropping socketed, off-package dram would probably make
> > it uninteresting for memcached and some space-intensive HPC.
> >
> > then again, if you think carefully about the numbers, any code today
> > that has a big working set is almost as anachronistic as codes that
> > use
> > disk-based algorithms.  (same conceptual thing happening: capacity is
> > growing much faster than the pipe.)
> >
> > regards, mark hahn.
> > _______________________________________________
> > Beowulf mailing list, Beowulf@beowulf.org<mailto:Beowulf@beowulf.org> 
> > sponsored by Penguin
> > Computing
> > To change your subscription (digest mode or unsubscribe) visit
> > http://www.beowulf.org/mailman/listinfo/beowulf
>
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org<mailto:Beowulf@beowulf.org> 
> sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org<mailto:Beowulf@beowulf.org> sponsored 
by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] difference between accelerators and co-processors

Reply via email to