Intel will have CSI and on die memory controller soon following what AMD has done for a few years. HT or CSI will help us build machines based on NUMA or similar architectures. Based on current memory technologies, I can't find any methods for "memory wall". And a 4 core processor can eat all memory bandwidth in some cases. With NUMA we can gat machines work as several current machine but connected with fast on-board connection. Image a super computer on desktop, and what's next? Many-core processors are coming, how to power beowulf with them? I think it is a very interesting topic. Power 6 is a really strange processor for me. It works with a in order architecture. I am looking forward to see any detailed evaluation for it. Regards, Li, Bo ----- Original Message ----- From: "Vincent Diepeveen" <[EMAIL PROTECTED]> To: "Toon Knapen" <[EMAIL PROTECTED]> Cc: <[email protected]>; "Robert G. Brown" <[EMAIL PROTECTED]> Sent: Friday, August 24, 2007 9:37 PM Subject: Re: [Beowulf] Intel Quad-Core or AMD Opteron
> Even worse, > > Does SSE2 code of intel not by default in th eintel primitives have an 'if > then else' that at opteron it runs without using SIMD? > > But apart from that, SIMD at oldie K8 is very slow compared to core2, > though not a factor 2. Barcelona for well optimized code should have an > IPC in SIMD of up to 40+% faster i guess than core2. > > So only 2 questions are when they release and especially at *what* price > for the 4 socket mainboards. > > A 16 core barcelona machine with 4 DDR2 memory controllers might be a very > mighty system for all kind of applications that need shared memory to > scale well. > > When releasing Barcelona core within a few months from now, AMD has a huge > lead over intel with respect to 4 core cpu's, as it seems to me. > > I feel personally intels choice of CPU design using small tiny L1 caches > from performance viewpoint is a catastrophic one. If there is just ONE > competitor for an intel chip that manages to clock a cpu nearly at the > same clock like intel and with the same number of cores, then intel > usually gets totally outperformed. Now that intel & AMD produce > cpu's at the same type of machines their cpu's, it seems to me > that AMD will in general outperform intel. > > Comparing the 2006 core2 with a 2003 release is not a very fair > compare IMHO. > > We can definitely conclude that intel managed to produce their new > generation cpu ( core2) more than 1 year sooner than AMD did do, using a > simple trick, namely glueing 2 dual core chips together. > > In the meantime i keep wondering more and more about intel not having an > equivalent on the market for AMD's hypertransport. > > For highend, when buying multiple socket nodes, it is hard to see intel as > an alternative to barcelona core driven machines, as it doesn't have any > form of load balancing thanks to having just 1 memory controller for all > cores. > > Most interesting for scientists might be buying a few nodes with some > double rail network and each node consisting out of 4 socket AMD machines > quadcore. Initially now perhaps 2Ghz. Then in end 2008 you can > upgrade the cpu's to 3+ Ghz. > > When also putting a lot of RAM onto such AMD machine, then > such a node of course also totally annihilates power6, even before power6 > gets taken into production, against a fraction of the price of a power6 > node. > > The advantage of using 4 socket machines for a cluster/supercomputer is > obviously the fact that the network costs form a smaller part of the total > solution, meanwhile keeping the total number of nodes limited. > > A few nodes you could arguably use 8 socket solutions for, not to scale up > to more cores, as most software can't handle such bad memory latencies, > but it might be you could even outgun power6 in terms of total memory a > node. > > What is the amount of ram that power6 supports versus the 8 socket AMD > solutions? > > Best Regards, > Vincent > > > > On Fri, 24 Aug 2007, Toon Knapen wrote: > >> > I understand that, when comparing Quad-Core Xeons with Opterons, >> > people focus on the scability issues of the different multi core >> > architectures, but we've run some benchmarks on both and the thing >> > that at the time surprised me the most was that if your application >> > makes much use of the functions provided by Intel Math Kernel Library, >> > a single Xeon core (e.g Clovertown) can be up to twice as fast as a >> > single Opteron core. >> >> >> You are comparing Intel MKL on Xeon with what exactly on Opteron? Intel >> MKL on Opteron is certainly not optimal. I hope you compared to GotoBLAS >> on Opteron. >> >> t >> _______________________________________________ >> Beowulf mailing list, [email protected] >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > > _______________________________________________ > Beowulf mailing list, [email protected] > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, [email protected] To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
