Hi Matthias,
Let's take the simplest things out first: have you compiled OpenMPI
yourself, separately on both clusters, using the specific drivers for
whatever network you have on each? In my experience OpenMPI is quite
finicky about working correctly, unless you do that. And when I don't, I
see ex
Ugh I think I did not catch up with the docs.
I started with a system that defaults to cgroup v1 but the Slurm doc for
that plugin is NOT available at that time. Thus I converted everything to
cgroup v2.
It appears that they are both supported and that documentation issue is
more on the dev side
If you are letting systemd taking most things over, you got systemd-cgtop
that work better than top for your case. There is also systemd-cgls for
non-interactive listing.
Also mind to check if you are using cgroup2? A mount to check your cgroup
would suffice. As cgroup is likely not supposed to be
Hello,
I'm running some tests in a very small testing environment (before applying in
the real scenario). My environment is only a computer with a old Intel i4 with
this "lscpu" configuration:
Architecture:x86_64
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core
In addition to checking under /sys/fs/cgroup like Tim said, if this is just to
convince yourself that you got the CPU restriction working, you could also open
`top` on the host running the job and observe that %CPU is now being held to
200,0 or lower (or if its multiple processes per job, summin
Cgroups don’t take effect until the job has started;. It’s a bit clunky, but
you can do things like this
inspect_job_cgroup_memory ()
{
set -- $(squeue "$@" -O JobId,UserName | sed -n '$p');
sudo -u $2 srun --pty --jobid "$1" bash -c 'cat
/sys/fs/cgroup/memory/slurm/uid_$(id
-u)/jo
Hello,
Thanks for your answers. I will try now!! One more question: is there any way
to check if Cgroups restrictions is working fine during a "running" job or
during SLURM scheduling process?
Thanks again!
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an e