On 02/08/17 13:37, Evan Burness wrote: > Thanks for the history lessons, Chris! Very interesting indeed.
My pleasure, to add to the history here's a paper from the APAC'05 conference 12 years ago that details how the then APAC (now NCI) set up their SGI Altix cluster, including a discussion on cpusets. http://www.kev.pulo.com.au/publications/apac05/apac05-apacnf-altix.pdf Also includes an interesting section on dealing with SGI's proprietary MPI stack and the problems it caused them. > Would be interesting to take it a step further and measure what the > impacts (good, bad, or otherwise) of picking a specific core on a given > CPU uArch layout for the OS. Sadly I was hoping that document would give some indication of the benefits of reducing jitter via cpusets, but it does not. I'd be very interested to hear what people have found there - I do know that Slurm allows you to reserve cores to generic resources like GPUs so that an administrator can enforce that only certain cores can access that resource (say the cores closest to a GPU). https://slurm.schedmd.com/gres.html It also supports "core specialisation" which is nebulously explained as: https://slurm.schedmd.com/core_spec.html # Core specialization is a feature designed to isolate system overhead # (system interrupts, etc.) to designated cores on a compute node. This # can reduce applications interrupts ranks to improve completion time. # The job will be charged for all allocated cores, but will not be able # to directly use the specialized cores. Usefully there is a PDF from the 2014 Slurm User Group which goes into more details about it, and includes references to work done by Cray and others into the issues about jitter and benefits from reducing it. https://slurm.schedmd.com/SUG14/process_isolation.pdf From that description it appears to only put the Slurm daemons for jobs into the group, but of course there would be nothing to stop you having a start up script that moved any other existing processes onto that core first via their own cgroup. Shame that Bull's test was too small to show any benefit! All the best, Chris -- Christopher Samuel Senior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf