Re: [slurm-users] Memory oversubscription and sheduling

2018-05-11 Thread Chris Samuel
Hey Michael! On Friday, 11 May 2018 1:00:24 AM AEST Michael Jennings wrote: > I'm surprised to hear that; this is the first time I've ever heard > that in regards to SLURM. I'd only ever heard folks complain about > TORQUE having that issue. Hmm, you might well be right, I might have done that

Re: [slurm-users] Memory oversubscription and sheduling

2018-05-10 Thread Michael Jennings
On Thursday, 10 May 2018, at 20:02:37 (+1000), Chris Samuel wrote: > For instance there's the LBNL Node Health Check (NHC) system that plugs into > both Slurm and Torque. > > https://slurm.schedmd.com/SUG14/node_health_check.pdf > > https://github.com/mej/nhc > > At ${JOB-1} we would run our i

Re: [slurm-users] Memory oversubscription and sheduling

2018-05-10 Thread Chris Samuel
On Monday, 7 May 2018 11:58:38 PM AEST Cory Holcomb wrote: > Thank you, for the reply I was beginning to wonder if my message was seen. It's a busy list at times. :-) > While I understand how batch systems work, if you have a system daemon that > develops a memory leak and consumes the memory o

Re: [slurm-users] Memory oversubscription and sheduling

2018-05-07 Thread Cory Holcomb
Thank you, for the reply I was beginning to wonder if my message was seen. While I understand how batch systems work, if you have a system daemon that develops a memory leak and consumes the memory outside of allocation. Not checking the used memory on the box before dispatch seems like a good w

Re: [slurm-users] Memory oversubscription and sheduling

2018-05-05 Thread Chris Samuel
On Thursday, 26 April 2018 3:28:19 AM AEST Cory Holcomb wrote: > It appears that I have a configuration that only takes into account the > allocated memory before dispatching. With batch systems the idea is for the users to set constraints for their jobs so the scheduler can backfill other jobs

[slurm-users] Memory oversubscription and sheduling

2018-04-25 Thread Cory Holcomb
Hello Is there a configuration where the scheduler will check for enough free memory on a host before dispatching a job. It appears that I have a configuration that only takes into account the allocated memory before dispatching. My goal is to allow jobs to over use memory but not have other job