Re: [slurm-users] Way MaxRSS should be interpreted

2018-04-17 Thread E.S. Rosenberg
so MaxRSS is a good estimate of how much RAM is needed to run a > given job. > > > > Gareth > > > > *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On > Behalf Of *E.S. Rosenberg > *Sent:* Tuesday, 17 April 2018 10:42 PM > *To:* Slurm User Community

Re: [slurm-users] Way MaxRSS should be interpreted

2018-04-17 Thread E.S. Rosenberg
! Eli On Tue, Apr 17, 2018 at 2:09 PM, Loris Bennett wrote: > Hi Eli, > > "E.S. Rosenberg" writes: > > > Hi fellow slurm users, > > We have been struggling for a while with understanding how MaxRSS is > reported. > > > > This because jobs ofte

[slurm-users] Way MaxRSS should be interpreted

2018-04-17 Thread E.S. Rosenberg
Hi fellow slurm users, We have been struggling for a while with understanding how MaxRSS is reported. This because jobs often die with MaxRSS not even approaching 10% of the requested memory sometimes. I just found the following document: https://research.csc.fi/-/a It says: "*maxrss *= maximum

[slurm-users] Troubleshooting scheduling

2018-03-25 Thread E.S. Rosenberg
Hi everyone, Is there a guide anywhere on how to figure out why jobs aren't being started? We have a cluster with nodes of mixed sizes/powers currently roughly half the cluster is idle even though there are ~5k jobs queued. All jobs are queued due to priority while only 1 job is marked as waiting

[slurm-users] scontrol return values

2018-03-14 Thread E.S. Rosenberg
Hi fellow slurm users, Today I noticed that scontrol returns 0 when it denies a drain request because no reason was supplied. It seems to me that this is wrong behavior, it should return 1 or some other error code so that scripts will know that the node was not actually drain. Thanks, Eli Slurm

[slurm-users] Elasticsearch Jobcomp Plugin broken in elastic 6.

2017-12-20 Thread E.S. Rosenberg
Hi slurm-users, Recently we updated to elasticsearch 6 and this ended up breaking the jobcomp plugin for slurm. The reason is that elastic has become picky about the HTTP Content-Type header which is never explicitly set in slurm and therefor defaults to application/x-www-form-urlencoded. The ch