Re: [Beowulf] programming multicore clusters

Gerry Creager Fri, 15 Jun 2007 06:12:03 -0700

For the forseeable future, I'm not developing much but will use thehybrid SMP/DM capabilities in WRF. Takes advantage of SMP availability,and supports message passing between SMP nodes. I've not used thiscapability for benchmarking but it appears to offer significant gains.

As we get more hybrid HPC capabilities planning for this will be moreimportant. A lot of system administrators (based on a statisticalsample of 4 local) have decreed that this is inefficient and one shouldeither do isolated shared memory or distributed memory so that we don'tmake our Gaussian users feel unloved. I'm skeptical.


gerry

Joseph Mack NA3T wrote:

I've googled the internet and searched the Beowulf archives
for "hybrid" || "multicore" and the only definitive statement I've foundis by Greg Lindahl, 17 Dec 2004
"Most of the folks interested in hybrid models a few years ago have nowgiven it up".
I assume this was from the era of 2-way SMP nodes.
Multicore CPUs are being projected for 15yrs into the future (statementby Pat Gelsinger, Intel's CTO, quoted in
http://cook.rfe.org/grid.pdf)

I expect the programming model will be a little different
for single image machines like the Altix, than for beowulfs
where each node has its own kernel (and which I assume will
be running dual quadcore mobos).
Still if a flat, one network model is used, all processes communicatethrough the off-board networking. Someone with a quadcore machine,running MPI on a flat network, told me that their application scalespoorly to 4 processors. Instead if processes on cores within a packagewere working on adjacent parts of the compute volume and communicatedthrough the on-board networking, then for a quadcore machine, theoff-board networking bandwidth requirement would drop by a factor of 4and scaling would improve.
In a quadcore machine, if 4 OMP/threads processes are started on eachquadcore package, could they be rescheduled at the end of theirtimeslice, on different cores arriving at a cold cache? On a largesingle image machine, could a thread be scheduled on another node andhave to communicate over the off-board network? In a single imagemachine (with a single address space) how does the OS know to mallocmemory from the on-board memory, rather than some arbitary location (onanother board)?
I expect everyone here knows all this. How is everyone going to programthe quadcore machines?
Thanks Joe


--
Gerry Creager -- [EMAIL PROTECTED]
Texas Mesonet -- AATLT, Texas A&M University        
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] programming multicore clusters

Reply via email to