Re: [Beowulf] coprocessor to do "physics calculations"

Joe Landman Sun, 14 May 2006 13:43:23 -0700


Mark Hahn wrote:

Didn't see anyone post this link regarding Aegia Physix processor. It is the 
most comprehensive write up I have seen.

http://www.blachford.info/computer/articles/PhysX1.html
yes, and even so it's not very helpful.  "fabric connecting compute and
memory elements" pretty well covers it!  the block diagram they give
could almost apply directly to Cell, for instance.

fundamentally, about these cell/aegia/gpu/fpga approaches,
you have to ask:

        - how cheap will it be in final, off-the-shelf systems?  GPUs
are most attractive this way, since absurd gaming cards havebecome a check-off even on corporate PCs (and thus high volume.)it's unclear to me whether Cell will go into any million-unitproducts other than dedicated game consoles.

This will drive prices for the Cell way down. Volume has a habit ofhelping do that. FPGAs will likely remain several thousand dollars perunit (Virtex 4 and above) unless you can drive many units, in which caseyou have to start looking at the economics of ASICs if your algorithmnever changes. If you have frequently changing algorithms, or want tobuild a special processor per code, then you need the programmability ofthe FPGA. In order for this to make sense from a price point of view,you have to see what the overall performance you get out of it. Fewpeople (I think) would be willing to pay $10kUSD for 10x performancedelta, though I would think that closer to 100x delta, this pricewouldn't be an issue.

        - does it run efficiently-enough?  most sci/eng I see is pretty
firmly based on 64b FP, often with large data. but afaikt,

Numerical stuff is pretty much DP FP right now. I saw one of the FPGAgrape units running a stellar dynamics simulator at SC05. If you arewilling to give up IEEE754/854 for performance, you can do some prettyamazing things.

        Cell (eg) doesn't do well on anything but in-cache 32b FP.

The idea with Cell and pretty much all APUs (acceleration processingunits) out there today is you need to double buffer and constantlystream data in. This limits which algorithms they can work on, thoughnot terribly so.

GPUs have tantalizingly high local-mem bandwidth, but alsodon't really do anything higher than 32b.

Single precision isn't so bad for many calculations. You would besurprised how many of the Auto companies run long crash simulations thisway. There are other considerations than base data type accuracy thatcan swamp the calculations.

        - how much time will it take to adapt to the peculiar programming
        model necessary for the device?  during the time spent on that,
        what will happen to the general-pupose CPU market?

Yes. This is why any APU must be easy to program. Non-programmableAPUs or minimally programmable units (fixed function units) are doomedto niches at best. You need to be able to turn your codes around on itvery quickly, in a time comparable to days, not months of Verilog/VHDL.

I think price, performance and time-to-market are all stacked against thisapproach, at least for academic/research HPC. it would be different if the

I disagree. On specific codes (possibly not FP heavy right now if weare talking about FPGAs), the price performance will be difficult tobeat, the performance will be difficult to beat. The time to market iscritical. Part of this is accelerator card design. Part of it is easeof spinning new applications. Application turn around time cannotexceed something close to a month, or no one will do it.

For various informatics codes, you can get 100-300x type performancedeltas (I have seen 300x reported in papers, others have reportedhigher). If you can get 100x better performance by adding in a $10kUSDboard, would you do it?

For chemistry codes and other FP heavy codes, you need a DP (64b)accelerator. FPGAs don't make good DP FP units right now, IEEE754 isexpensive in terms of gates. You can't get enough of them on there.Best I have heard of is the SRC MAP processor which had something like100 units, running at 150 MHz, that could just eek out 11 GFlops. Asthis is comparable to dual core Opteron, this is not the way you want togo for double precision floating point. There are other options (nowand coming on line).

general-purpose CPU market stood still, or if there were no way to scale up
existing clusters...

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: [EMAIL PROTECTED]
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] coprocessor to do "physics calculations"

Reply via email to