Hi fellow slurm users,
We have been struggling for a while with understanding how MaxRSS is
reported.

This because jobs often die with MaxRSS not even approaching 10% of the
requested memory sometimes.

I just found the following document:
https://research.csc.fi/-/a

It says:
"*maxrss *= maximum amount of memory used at any time by any process in
that job. This applies directly for serial jobs. For parallel jobs you need
to multiply with the number of cores (max 16 or 24 as this is reported only
for that node that used the most memory)"

While 'man sacct' says:
"Maximum resident set size of all tasks in job."

Which explanation is correct? How should I be interpreting MaxRSS?

Thanks,
Eli

Reply via email to