On Jan 11, 2013, at 2:59 PM, Reuti wrote: > Am 11.01.2013 um 14:22 schrieb Vincent Diepeveen: > >> On Jan 11, 2013, at 6:03 AM, Bill Broadley wrote: >> >>> >>> Over the last few months I've been hearing quite a few negative >>> comments >>> about AMD. Seems like most of them are extrapolating from desktop >>> performance. >>> >>> Keep in mind that it's quite a stretch going from a desktop (single >>> socket, 2 memory channels) to a server (dual socket, 4x the cores, 8 >>> memory channels). >>> >> >> Bill - a 2 socket system doesn't deliver 512GB ram. > > Maybe I get it wrong, but I was checking these machines recently: > > IBM's x3550 M4 goes up to 768 GB with 2 CPUs http:// > public.dhe.ibm.com/common/ssi/ecm/en/xsd03131usen/XSD03131USEN.PDF
Shops selling it say it has a max of 384GB ram. Gonna be expensive DIMMs btw. See: http://www.comcom.nl/p/ibm/default_product/7915d2g/ x3650_m4_xeon_6c_e5_2630_95w/? =&channel_code=70&product_code=44985452&utm_source=adwords- generiek&gclid=CI3K4sO84LQCFQRc3godZgwA7Q > > IBM's x3950 X5 goes up to 3 TB with their MAX-5 extension using 4 > CPUs, so I assume 1.5 TB with 2 CPUs could work too http:// > public.dhe.ibm.com/common/ssi/ecm/en/xsd03054usen/XSD03054USEN.PDF $200k a box? shops here don't offer it. IBM does. Starts at $120k dollar. You've got only 128 GB ram then though. Let's say we multiply that by 4 to get 512GB RAM. http://www-304.ibm.com/shop/americas/webapp/wcs/stores/servlet/ default/ProductDisplay? productId=4611686018426177038&storeId=1&langId=-1&categoryId=46116860184 25279711&dualCurrId=73&catalogId=-840 > > -- Reuti > > >> Your compare at 2 socket domain doesn't make sense for someone who >> needs 512GB ram, >> the performance of 4 socket systems is total different from 2. >> >> [snip] >>> >>> I figured I'd add a few comments: >>> * Latency for a quad socket AMD is around 64ns to a random piece >>> of memory (not 600ns as recently mentioned). >> >> I wrote a testprogram for this in 2003. >> >> You have no idea what TLB trashing accesses are obviously at the >> hundreds of gigabyte area. >> >> There is 0 cheap systems on the planet where you can get a bunch of >> random bytes in 64 ns >> from a random spot out of 500GB of RAM, a memory line you previously >> hadn't opened yet and >> which with sureness isn't in the cache yet. You will be looking at >> 400+ ns latencies bestcase. >> >> You won't get it faster at any platform which is affordable (of >> course 512GB of SRAM would be faster, >> yet let's not go into theoretic discussions here - as you can't >> afford 512GB of SRAM). >> >>> * AMD quad sockets with 512GB ram start around $9k ($USA) >> >> You can easily build one with new components from ebay for $2k. Then >> add the 512GB ram price to that. >> New from a shop the AMD stuff is dirt cheap as well, as a single core >> ain't fast of course of the new bulldozer line, >> offers fully assembled and everything ready working is around $6k >> mark - excluding 512GB ram of course. >> >> Yet it has better latency to a 512 GB block of RAM than intels 4 >> socket systems. >> >> And that will be many many hundreds of nanoseconds of course. >> >>> * With OpenMP, pthreads, MPI or other parallel friendly code a quad >>> socket amd can look up random cache line approximately every >>> 2.25ns. >>> (64 threads banging on 16 memory channels at once). >> >> You still didn't get the picture of TLB trashing software huh? >> >> It reads each time from a random memory location. Only at the end of >> the calculation the search space converges a tad, >> but still it's random. >> >> A measurement i have from a tad older 8 socket intel box here is 700 >> ns for similar TLB trashing behaviour. >> >>> * I've seen no problems with the AMD memory system, in general >>> the 2k pin/4 memory bus amd sockets seem to performance similarly >>> to Intel. >> >> For random accesses at a single or 2 sockets there is huge >> differences (all cores busy). >> >> Intel single socket around 90 ns for my benchmark and bulldozer >> single socket around 150-170 ns ( 8 cores busy). >> >> You really have no idea what 'random' reads are. >> >>> >>> And example of AMD's bandwidth scaling on a quad socket with 64 >>> cores: >>> http://cse.ucdavis.edu/bill/pstream/bm3-all.png >>> >>> I don't have a similar Intel, but I do have a dual socket e5: >>> http://cse.ucdavis.edu/bill/pstream/e5-2609.png >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin >>> Computing >>> To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf >> >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf