Hello List,
does anyone have experience with DefCpuPerGPU and jobs requesting
multiple partitions? I would expect Slurm to select a partition from
those requested by the job, then assign CPUs based on that
partition's DefCpuPerGPU. But according to my observations, it appears
that (at least someti
On Fri, Feb 12, 2021 at 09:47:56AM +0100, Ole Holm Nielsen wrote:
>
> Could you kindly say where you have found documentation of the
> DefaultCpusPerGpu (or DefCpusPerGpu?) parameter.
Humph, I shouldn't have written the message from memory. It's actually
DefCpuPerGPU (singular).
> I'm unable t
On Mon, Feb 08, 2021 at 12:36:06PM +0100, Ansgar Esztermann-Kirchner wrote:
> Of course, one could use different partitions for different nodes, and
> then submit individual jobs with CPU requests tailored to one such
> partition, but I'd prefer a more flexible approach where a giv
Hi Yair,
thank you very much for your reply. I'll keep the points you make in
mind while we're evolving our configuration toward something that can
be called production-ready.
A.
--
Ansgar Esztermann
Sysadmin Dep. Theoretical and Computational Biophysics
http://www.mpibpc.mpg.de/grubmueller/esz
Hello List,
we're running a heterogeneous cluster (just x86_64, but a lot of
different node types from 8 to 64 HW threads, 1 to 4 GPUs).
Our processing power (for our main application, at least) is
exclusively provided by the GPUs, so cons_tres looks quite promising:
depending on the size of the
Hello List,
I'm seeing a version clash when trying to start MPI jobs via srun.
In stderr, my executable (mdrun) complains about:
mdrun: /usr/lib/x86_64-linux-gnu/slurm/auth_munge.so: Incompatible Slurm plug
in version (17.11.9)
I've checked my installation, and found nothing that suggests there
Hi,
I'd like to share our set-up as well, even though it's very
specialized and thus probably won't work in most places. However, it's
also very efficient in terms of budget when it does.
Our users don't usually have shared data sets, so we don't need high
bandwidth at any particular point -- the
Hi,
> On 05.02.19 16:46, Ansgar Esztermann-Kirchner wrote:
> > [...]-- we'd like to have two "half nodes", where
> > jobs will be able to use one of the two GPUs, plus (at most) half of
> > the CPUs. With SGE, we've put two queues on the nodes,
Hello List,
we're operating a large-ish cluster (about 900 nodes) with diverse
hardware. It has been running with SGE for several years now, but the
more we refine our configuration, the more we're feeling SGE's
limitations.
Therefore, we're considering switching to Slurm.
The latest challenge i