Re: [slurm-users] backfill scheduler does not work for heterogeneous jobs (version 17.11)

2018-12-09 Thread Ana Jokanović
Hi Ken, Here is my slurm.conf: ControlMachine=s19r2b08 AuthType=auth/none CryptoType=crypto/openssl JobCredentialPrivateKey=/home/bsc33/bsc33882/slurm_over_slurm/etc/slurm.key JobCredentialPublicCertificate=/home/bsc33/bsc33882/slurm_over_slurm/etc/slurm.cert MpiDefault=none ProctrackTyp

Re: [slurm-users] CPU & memory usage summary for a job

2018-12-09 Thread Sam Hawarden
Hi Aravindh For our small 3 node cluster I've hacked together a per-node python script that collects current and peak cpu, memory and scratch disk usage data on all jobs running on the cluster and builds a fairly simple web-page based on it. It shouldn't be hard to make it store those data poin

Re: [slurm-users] CPU & memory usage summary for a job

2018-12-09 Thread Renfro, Michael
For the simpler questions (for the overall job step, not real-time), you can 'sacct --format=all’ to get data on completed jobs, and then: - compare the MaxRSS column to the ReqMem column to see how far off their memory request was - compare the TotalCPU column to the product of the NCPUS and El

Re: [slurm-users] CPU & memory usage summary for a job

2018-12-09 Thread Paul Edmon
This is the idea behind XDMod's SUPReMM.  It does generate a ton of data though, so it does not scale to very active systems (i.e. churning over tens of thousands of jobs). https://github.com/ubccr/xdmod-supremm -Paul Edmon- On 12/9/2018 8:39 AM, Aravindh Sampathkumar wrote: Hi All. I was

[slurm-users] CPU & memory usage summary for a job

2018-12-09 Thread Aravindh Sampathkumar
Hi All. I was wondering if anybody has thought of or hacked around a way to record CPU and memory consumption of a job during its entire duration and give a summary of the usage pattern within that job?Not the MaxRSS and CPU Time that already gets reported for every job. I'm thinking more like

Re: [slurm-users] possible to set memory slack space before killing jobs?

2018-12-09 Thread Raymond Wan
Hi, On 7/12/2018 6:23 PM, Bjørn-Helge Mevik wrote: Raymond Wan writes: However, a more general question... I thought there is no fool-proof way to watch the amount of memory a job is using. What if within the script they ran another program using "nohup", for example. Wouldn't slurm be u