We are using cgroups to track resource usage of our jobs. The jobs are run
in docker with docker's --parent-cgroup flag pointing at the slurm job's
cgroup. This works great for limiting memory usage.
Unfortunately the maximum memory usage, maxRSS, is not accurately reported
in sacct. While the cgr
Il 15/09/20 10:14, Diego Zuccato ha scritto:
Seems my corrections actually work only for single-node jobs.
In case of multi-node jobs, it only considers the memory used on one
node, hence understimates the real efficiency.
Someone more knowledgeable than me can spot the error?
TIA!
> I'm neither