Re: [slurm-users] CPU & memory usage summary for a job

2018-12-10 Thread Carlos Fenoy
You can also use the influxdb profiling plugin I developed that’s included in the latest slurm version. It will provide live cpu and memory usage per task, step, host and job. You can then provide a grafana dashboard to display the live metrics Regards, Carlos Sent from my iPhone > On 9 Dec 2

Re: [slurm-users] CPU & memory usage summary for a job

2018-12-10 Thread Jacob Jenson
Would job profiling with HDF5 work as well? https://slurm.schedmd.com/hdf5_profile_user_guide.html Jacob On Sun, Dec 9, 2018 at 4:17 PM Sam Hawarden wrote: > Hi Aravindh > > For our small 3 node cluster I've hacked together a per-node python script > that collects current and peak cpu, memory

Re: [slurm-users] CPU & memory usage summary for a job

2018-12-09 Thread Sam Hawarden
Hi Aravindh For our small 3 node cluster I've hacked together a per-node python script that collects current and peak cpu, memory and scratch disk usage data on all jobs running on the cluster and builds a fairly simple web-page based on it. It shouldn't be hard to make it store those data poin

Re: [slurm-users] CPU & memory usage summary for a job

2018-12-09 Thread Renfro, Michael
For the simpler questions (for the overall job step, not real-time), you can 'sacct --format=all’ to get data on completed jobs, and then: - compare the MaxRSS column to the ReqMem column to see how far off their memory request was - compare the TotalCPU column to the product of the NCPUS and El

Re: [slurm-users] CPU & memory usage summary for a job

2018-12-09 Thread Paul Edmon
This is the idea behind XDMod's SUPReMM.  It does generate a ton of data though, so it does not scale to very active systems (i.e. churning over tens of thousands of jobs). https://github.com/ubccr/xdmod-supremm -Paul Edmon- On 12/9/2018 8:39 AM, Aravindh Sampathkumar wrote: Hi All. I was