[slurm-users] Re: Slurm 24.05 and OpenMPI

2025-03-26 Thread Davide DelVento via slurm-users
Hi Matthias, Let's take the simplest things out first: have you compiled OpenMPI yourself, separately on both clusters, using the specific drivers for whatever network you have on each? In my experience OpenMPI is quite finicky about working correctly, unless you do that. And when I don't, I see ex

[slurm-users] Re: Using more cores/CPUs that requested with

2025-03-26 Thread Shunran Zhang via slurm-users
Ugh I think I did not catch up with the docs. I started with a system that defaults to cgroup v1 but the Slurm doc for that plugin is NOT available at that time. Thus I converted everything to cgroup v2. It appears that they are both supported and that documentation issue is more on the dev side

[slurm-users] Re: Using more cores/CPUs that requested with

2025-03-26 Thread Shunran Zhang via slurm-users
If you are letting systemd taking most things over, you got systemd-cgtop that work better than top for your case. There is also systemd-cgls for non-interactive listing. Also mind to check if you are using cgroup2? A mount to check your cgroup would suffice. As cgroup is likely not supposed to be

[slurm-users] Doubt with SelectTypeParameters in slurm.conf

2025-03-26 Thread Gestió Servidors via slurm-users
Hello, I'm running some tests in a very small testing environment (before applying in the real scenario). My environment is only a computer with a old Intel i4 with this "lscpu" configuration: Architecture:x86_64 CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core

[slurm-users] Re: Using more cores/CPUs that requested with

2025-03-26 Thread Laura Hild via slurm-users
In addition to checking under /sys/fs/cgroup like Tim said, if this is just to convince yourself that you got the CPU restriction working, you could also open `top` on the host running the job and observe that %CPU is now being held to 200,0 or lower (or if its multiple processes per job, summin

[slurm-users] Re: Using more cores/CPUs that requested with

2025-03-26 Thread Cutts, Tim via slurm-users
Cgroups don’t take effect until the job has started;. It’s a bit clunky, but you can do things like this inspect_job_cgroup_memory () { set -- $(squeue "$@" -O JobId,UserName | sed -n '$p'); sudo -u $2 srun --pty --jobid "$1" bash -c 'cat /sys/fs/cgroup/memory/slurm/uid_$(id -u)/jo

[slurm-users] Re: Using more cores/CPUs that requested with

2025-03-26 Thread Gestió Servidors via slurm-users
Hello, Thanks for your answers. I will try now!! One more question: is there any way to check if Cgroups restrictions is working fine during a "running" job or during SLURM scheduling process? Thanks again! -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an e