I added that line and restarted the service via
# systemctl restart slurmctld
However, still I get the same error.
Moreover, when I salloc, I don't see slurm/ in cgroup path
[shams@hpc ~]$ salloc
salloc: Granted job allocation 293
[shams@hpc ~]$ bin/show_my_cgroup --debug
bash: bin/show_my_cgrou
>depends on whether "ConstrainSwapSpace=yes" appears in cgroup.conf.
Thanks for the detail.
On the head node, mine is
# cat cgroup.conf
CgroupAutomount=yes
CgroupReleaseAgentDir="/etc/slurm/cgroup"
ConstrainCores=no
ConstrainRAMSpace=no
Is that the root of the problem?
Regards,
Mahmood
apologies for a long response; didn't have time for a shorter one ;)
>you have it backwards. slurm creates a cgroup for the job (step)
and uses the cgroup control to tell the kernel how much memory to
permit the job-step to use.
I would like to know how can I increase the threshold in slur
>you have it backwards. slurm creates a cgroup for the job (step)
>and uses the cgroup control to tell the kernel how much memory to
>permit the job-step to use.
I would like to know how can I increase the threshold in slurm config
files. I can not find it.
According to [1], " No value is provi
>how much memory are you requesting from Slurm in your job?
#SBATCH --mem=38GB
also,
# sacctmgr list association format=user,grptres%30 | grep shams
shams cpu=10,mem=40G
Regards,
Mahmood
Excuse me, I have confused with that.
While the cgroup value is 68GB, I run on terminal and see the VSZ is about
80GB and the program runs normally.
However, with slurm on that node, I can not run.
how much memory are you requesting from Slurm in your job?
Why on terminal I can run, but I can
Excuse me, I have confused with that.
While the cgroup value is 68GB, I run on terminal and see the VSZ is about
80GB and the program runs normally.
However, with slurm on that node, I can not run.
Why on terminal I can run, but I can not run via slurm?
I wonder if slurm gets the right value from
I see this
# cat /sys/fs/cgroup/memory/memory.memsw.usage_in_bytes
71496372224
which is about 68GB.
As I said, running from terminal has no problem.
Is is just fine to set a larger value (130GB) as below?
echo 139586437120 > /sys/fs/cgroup/memory/memory.memsw.usage_in_bytes
of course not. "u
I see this
# cat /sys/fs/cgroup/memory/memory.memsw.usage_in_bytes
71496372224
which is about 68GB.
As I said, running from terminal has no problem.
Is is just fine to set a larger value (130GB) as below?
echo 139586437120 > /sys/fs/cgroup/memory/memory.memsw.usage_in_bytes
Regards,
Mahmood
Yes, it uses a large value for virtual size.
Since I can run it via terminal (outside of slurm), I think kernel
parameters are OK.
In other words, I have to configure slurm for that purpose.
Which slurm configuration parameter is in charge of that?
Regards,
Mahmood
On Fri, Jan 24, 2020 at 5:22
Does your Slurm cgroup or node OS cgroup configuration limit the virtual
address space of processes? The "Error memory mapping" is thrown by blast when
trying to create a virtual address space that exposes the contents of a file on
disk (see "man mmap") so the file can be accessed via pointers
11 matches
Mail list logo