On Wed, Dec 24, 2008 at 09:03:38PM +1100, Chris Samuel wrote: > ----- "John Hearns" <hear...@googlemail.com> wrote: > > > SGI Altix have 'bootcpusets' which means you can slice > > off one or two processors to take care of OS housekeeping > > tasks, > > Now that cpusets have been in the mainline kernel for > some time you should be able to do this with any modern > distro. > > I contemplated doing this on our Barcelona cluster, but > sacrificing 1 core in 8 was a bit too much of a high price > to pay. But people with higher core counts per node might > find it attractive.
This seems like a be a benchmark decision based on application load and 'implied IO+OS' loading as well as the ability to localize the IO+OS activity to the sacrificed CPU core. Of interest CPU and system designers and OS engineers are set on the SMP model where all the parts are considered equal. This simplification ignores the reality that interrupts, networking, encryption and file IO are not floating point intensive and thus leave FPU core transistors idle. The decisions are different when dedicated IO channel processors or vector processors are built into the hardware of the system. Today the apparent cut and paste model of multi core CPU design where the most critical design issues are at the memory (cache) interface pushes the issue out to the cluster user/ manager and perhaps into the batch system. Outside of heat issues adding yet another FPU core is almost free given today's transistor budgets. For a long time I felt that the Intel Hyper-Threading was an interesting decision in that it all but stated that floating point was a second class activity in the system. However the complexity to add more execution units may have nixed more hyper-threading efforts. The benchmarking (combined with CPU affinity) work might be interesting. Leaving 12.5% of the FPU resource on the table might look like a lot at first but since the other seven cores might be idled by a slow rank sidetracked by interrupts and IO the benchmark FPU delta per rank need only be about one seventh of that (i.e 2%) to generate a net gain. This might be an easy percentage to gain by localizing interrupts and IO so user space activity affinity does not conflict. But this is not strictly SMP so the hardware and OS design may limit the gains. Two percent in an eight core system does not seem intuitive. Did I get this turned inside out? -- T o m M i t c h e l l Found me a new hat, now what? _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf