Excuse me, I have confused with that.
While the cgroup value is 68GB, I run on terminal and see the VSZ is about
80GB and the program runs normally.
However, with slurm on that node, I can not run.

how much memory are you requesting from Slurm in your job?

Why on terminal I can run, but I can not run via slurm?

the purpose of slurm is to allocate resources.  logging into a node "bare"
is "evading" everything slurm does.

I wonder if slurm gets the right value from kernel's cgroup.

you have it backwards.  slurm creates a cgroup for the job (step)
and uses the cgroup control to tell the kernel how much memory to permit the job-step to use.

I would like to locally solve the problem for blast and I am not seeking a
system wide solution right now.

there's nothing unique about your system or blast (which is extremely common
on many large slurm installs).

regards, mark hahn

Reply via email to