Mark Hahn wrote:
Didn't see anyone post this link regarding Aegia Physix processor. It is the 
most comprehensive write up I have seen.

http://www.blachford.info/computer/articles/PhysX1.html

yes, and even so it's not very helpful.  "fabric connecting compute and
memory elements" pretty well covers it!  the block diagram they give
could almost apply directly to Cell, for instance.

fundamentally, about these cell/aegia/gpu/fpga approaches,
you have to ask:

        - how cheap will it be in final, off-the-shelf systems?  GPUs
are most attractive this way, since absurd gaming cards have become a check-off even on corporate PCs (and thus high volume.) it's unclear to me whether Cell will go into any million-unit products other than dedicated game consoles.

This will drive prices for the Cell way down. Volume has a habit of helping do that. FPGAs will likely remain several thousand dollars per unit (Virtex 4 and above) unless you can drive many units, in which case you have to start looking at the economics of ASICs if your algorithm never changes. If you have frequently changing algorithms, or want to build a special processor per code, then you need the programmability of the FPGA. In order for this to make sense from a price point of view, you have to see what the overall performance you get out of it. Few people (I think) would be willing to pay $10kUSD for 10x performance delta, though I would think that closer to 100x delta, this price wouldn't be an issue.

        - does it run efficiently-enough?  most sci/eng I see is pretty
firmly based on 64b FP, often with large data. but afaikt,

Numerical stuff is pretty much DP FP right now. I saw one of the FPGA grape units running a stellar dynamics simulator at SC05. If you are willing to give up IEEE754/854 for performance, you can do some pretty amazing things.

        Cell (eg) doesn't do well on anything but in-cache 32b FP.

The idea with Cell and pretty much all APUs (acceleration processing units) out there today is you need to double buffer and constantly stream data in. This limits which algorithms they can work on, though not terribly so.

GPUs have tantalizingly high local-mem bandwidth, but also don't really do anything higher than 32b.

Single precision isn't so bad for many calculations. You would be surprised how many of the Auto companies run long crash simulations this way. There are other considerations than base data type accuracy that can swamp the calculations.

        - how much time will it take to adapt to the peculiar programming
        model necessary for the device?  during the time spent on that,
        what will happen to the general-pupose CPU market?

Yes. This is why any APU must be easy to program. Non-programmable APUs or minimally programmable units (fixed function units) are doomed to niches at best. You need to be able to turn your codes around on it very quickly, in a time comparable to days, not months of Verilog/VHDL.

I think price, performance and time-to-market are all stacked against this approach, at least for academic/research HPC. it would be different if the

I disagree. On specific codes (possibly not FP heavy right now if we are talking about FPGAs), the price performance will be difficult to beat, the performance will be difficult to beat. The time to market is critical. Part of this is accelerator card design. Part of it is ease of spinning new applications. Application turn around time cannot exceed something close to a month, or no one will do it.

For various informatics codes, you can get 100-300x type performance deltas (I have seen 300x reported in papers, others have reported higher). If you can get 100x better performance by adding in a $10kUSD board, would you do it?

For chemistry codes and other FP heavy codes, you need a DP (64b) accelerator. FPGAs don't make good DP FP units right now, IEEE754 is expensive in terms of gates. You can't get enough of them on there. Best I have heard of is the SRC MAP processor which had something like 100 units, running at 150 MHz, that could just eek out 11 GFlops. As this is comparable to dual core Opteron, this is not the way you want to go for double precision floating point. There are other options (now and coming on line).

general-purpose CPU market stood still, or if there were no way to scale up
existing clusters...

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: [EMAIL PROTECTED]
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to