Hi Ken,
Here is my slurm.conf:
ControlMachine=s19r2b08
AuthType=auth/none
CryptoType=crypto/openssl
JobCredentialPrivateKey=/home/bsc33/bsc33882/slurm_over_slurm/etc/slurm.key
JobCredentialPublicCertificate=/home/bsc33/bsc33882/slurm_over_slurm/etc/slurm.cert
MpiDefault=none
ProctrackTyp
Hi Aravindh
For our small 3 node cluster I've hacked together a per-node python script that
collects current and peak cpu, memory and scratch disk usage data on all jobs
running on the cluster and builds a fairly simple web-page based on it. It
shouldn't be hard to make it store those data poin
For the simpler questions (for the overall job step, not real-time), you can
'sacct --format=all’ to get data on completed jobs, and then:
- compare the MaxRSS column to the ReqMem column to see how far off their
memory request was
- compare the TotalCPU column to the product of the NCPUS and El
This is the idea behind XDMod's SUPReMM. It does generate a ton of data
though, so it does not scale to very active systems (i.e. churning over
tens of thousands of jobs).
https://github.com/ubccr/xdmod-supremm
-Paul Edmon-
On 12/9/2018 8:39 AM, Aravindh Sampathkumar wrote:
Hi All.
I was
Hi All.
I was wondering if anybody has thought of or hacked around a way to
record CPU and memory consumption of a job during its entire duration
and give a summary of the usage pattern within that job?Not the MaxRSS and CPU
Time that already gets reported for every job.
I'm thinking more like
Hi,
On 7/12/2018 6:23 PM, Bjørn-Helge Mevik wrote:
Raymond Wan writes:
However, a more general question... I thought there is no fool-proof
way to watch the amount of memory a job is using. What if within the
script they ran another program using "nohup", for example. Wouldn't
slurm be u