----- Original Message ----- From: "Mark Hahn" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: <beowulf@beowulf.org>
Sent: Wednesday, September 20, 2006 6:51 PM
Subject: Re: [Beowulf] Has anyone actually seen/used a cell system?


Can anyone point me to a url, or tell me what their
experience is with this technology?  Is it as fast as
it's purported to be?

I haven't come anywhere near a Cell, but then again, I'm not sure I'd want to. 14.6 Gflops (64b, and assuming the full 8 SPE's) isn't bad, but then again, a 3 GHz Core2 dual-core is 24 Gflops, and almost certainly a lot more accessible, shipping now, runs linux, supported by compilers and goto-blas, etc.

Comeon let's do some realistic comparision. Assuming IBM didn't totally mess up,
let's do an objective compare for multiplication.

Gflops is an overrated definition simply.

The thing determining the number of matrix elements you can multiply a second more than anything else,
is the slow instruction on most cpu called multiply.

It is 4 cycles at P4 or so (SSE2) and 4 cycles at K8.

Didn't see a conroe document yet but knowing it also has just a SINGLE execution unit doing multiplies (and probably casting the SSE2 multiplication unit for FPU and also using that one for integers or something) it means probably also a cycle or 4 for it.

Just it is possible that when doing a multiplication that it doesn't block all other execution units (which is what K8 seems to be doing).

For the NTT i'm doing here (that is a bugfree form of multiplication, the FFT version you never know for sure your result is correct and you have to redo it a second time to be 100% sure) what is interesting is a multiplication of 64 x 64 bits == 128 bits. So that's obviously integer calculation.

If we compare core2 there, then core2 is an ideal processor for about everything, yet it has 2 cores @ 3Ghz.

2 cores @ 3Ghz  / 4 cycles = 1.5 Ghz multiply cycle

Now if we compare the CELL processor. Not sure about its latest plans (i remember vaguely 4Ghz as its target and i would be amazed if IBM actually gets it to 4Ghz). Now it most likely will also manage to get it down to a cycle or 4 for a multiply 64 x 64 bits == 128 bits.

Then we're speaking about 8 * 4Ghz / 4 = 8Ghz multiply cycle.

A potential 6 times faster simply than core2 for what is the most time consuming part of
matrix multiplications, namely the multiplication unit.

Now there is something to say for SSE here which with 1 dang can multiply 2 at a time.

On other hand we do not know the specs of the CELL there which should be able to do more instructions a cycle than core2 in one document i read (could be totally outdated).

If not then core wins back factor 1.5 or so in speed there, still no big deal. CELL just beats it totally there.

Now it is of course obvious that the vaste majority of resources that go from clusters to software is used for matrix multiplication type software. So that it might be extremely ugly weak in branch mispredicts, which means it is a selfdestructing chip that cell for my chess software, that's the other part of the story.

Say about 70% will be extremely happy with that chip and 30% will just praise core2 into the skies.

There is something positive however about core2 which cell cannot say and that is that core2 we can already order in a store.

if you could readily get a 8-16x PCIE card with 2 or more Cell chips and a bunch of ~50 GB/s local memory, for cheap, it could be quite something.

Yeah that's faster than most supercomputers for matrix calculations.

And also for a CHEAP price.

For all the highend guys who will then say: "oh ahhh au, but how about losing bits".

Well, nothing as inaccurate as FFT calculations with floating point roundoffs everywhere.

NTT is totally superior there (but factor 2 slower).

And if you really have no other argument than that, well just run a SECOND cluster of cells and let those calculate for you be calculated a second time. Which gives a 100% verification
that your FFT ran correct too.

Of course another disadvantage of CELL will probably be limited RAM.

Certain machines (orion!) which are relative cheap and have a couple of hundreds of gigabyte of RAM against an attractive price can really boost certain applications.

Yet pissing on CELL isn't a real good idea.

If what you need is massive calculation power then 8 cores @ 4Ghz will of course kick silly 2 cores @ 3Ghz, especially knowing that most chip manufacturers don't seem to have an especially fast multiply instruction on their chips.

Just measuring gflops is total madness.

The N log N in those calculations is the number of multiplies.

Make a chip with 2 integer multiplication units that don't block each other and NTT in integers is faster than any SSE implementation of FFT, besides having 0 round off errors.

CELL is already quite ideal there in that it has 8 cores.

Yet of course it is wishful thinking such chips exist any soon with 2 multiplication units for a very cheap price (no itanium isn't a cheap chip additional it's just 1.6Ghz) which would simply speedup that calculation code factor 2.

If i nonstop do integer multiplications in that k8 dual core chip at 2 chips (4 cores in total) then after a number of days the machine is just DEAD sometimes. black screen etcetera. Just the chips failed simply.

It only happens if you EXCLUSIVELY do NTT nonstop, so it seems that at least for K8 dual core chips the multiplication unit is extremely weak and belongs to the worst case path.

That means probably that adding a second unit will not cost that much more transistors, but will decrease yields, making the chip production a tad more expensive.

So please don't piss on a chip that has hopefully 8 such units instead of todays chips 2.

It is potentially at least factor 4 faster at the same clock for such DSP type code.

Vincent

Apparently RedHat is developing
EL 4.3 to run on the system?

to an OS, it's basically a kinda low-end PPC chip with 8 very weird FP coprocessors, the latter not relevant to the OS...
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to