Re: [slurm-users] how to find out why a job won't run?

2018-11-26 Thread R. Paul Wiegand
Steve, This doesn't really address your question, and I am guessing you are aware of this; however, since you did not mention it: "scontrol show job " will give you a lot of detail about a job (a lot more than squeue). It's "Reason" is the same as sinfo and squeue, though. So no help there. I'v

Re: [slurm-users] Reserving a GPU

2018-10-22 Thread R. Paul Wiegand
I had the same question and put in a support ticket. I believe the answer is that you cannot. On Mon, Oct 22, 2018, 11:51 Christopher Benjamin Coffey < chris.cof...@nau.edu> wrote: > Hi, > > I can't figure out how one would create a reservation to reserve a gres > unit, such as a gpu. The man pa

Re: [slurm-users] x11 forwarding not available?

2018-10-15 Thread R. Paul Wiegand
I believe you also need: X11UseLocalhost no > On Oct 15, 2018, at 7:07 PM, Dave Botsch wrote: > > Hi. > > X11 forwarding is enabled and works for normal ssh. > > Thanks. > > On Mon, Oct 15, 2018 at 09:55:59PM +, Rhian Resnick wrote: >> >> >> Double check /etc/ssh/sshd_config allows X

Re: [slurm-users] GPU / cgroup challenges

2018-05-21 Thread R. Paul Wiegand
uel wrote: > > On Wednesday, 2 May 2018 11:04:34 PM AEST R. Paul Wiegand wrote: > >> When I set "--gres=gpu:1", the slurmd log does have encouraging lines such >> as: >> >> [2018-05-02T08:47:04.916] [203.0] debug: Allowing access to device >> /dev

Re: [slurm-users] GPU / cgroup challenges

2018-05-02 Thread R. Paul Wiegand
manager. On Tue, May 1, 2018 at 8:29 PM, Christopher Samuel wrote: > On 02/05/18 10:15, R. Paul Wiegand wrote: > >> Yes, I am sure they are all the same. Typically, I just scontrol >> reconfig; however, I have also tried restarting all daemons. > > > Understood. Any

Re: [slurm-users] GPU / cgroup challenges

2018-05-01 Thread R. Paul Wiegand
pgrade. Should I just wait and test after the upgrade? On Tue, May 1, 2018, 19:56 Christopher Samuel wrote: > On 02/05/18 09:31, R. Paul Wiegand wrote: > > > Slurm 17.11.0 on CentOS 7.1 > > That's quite old (on both fronts, RHEL 7.1 is from 2015), we started on > that same

Re: [slurm-users] GPU / cgroup challenges

2018-05-01 Thread R. Paul Wiegand
Slurm 17.11.0 on CentOS 7.1 On Tue, May 1, 2018, 19:26 Christopher Samuel wrote: > On 02/05/18 09:23, R. Paul Wiegand wrote: > > > I thought including the /dev/nvidia* would whitelist those devices > > ... which seems to be the opposite of what I want, no? Or do I > >

Re: [slurm-users] GPU / cgroup challenges

2018-05-01 Thread R. Paul Wiegand
Thanks Chris. I do have the ConstrainDevices turned on. Are the differences in your cgroup_allowed_devices_file.conf relevant in this case? On Tue, May 1, 2018, 19:23 Christopher Samuel wrote: > On 02/05/18 09:00, Kevin Manalo wrote: > > > Also, I recall appending this to the bottom of > > > >

Re: [slurm-users] GPU / cgroup challenges

2018-05-01 Thread R. Paul Wiegand
yours > ... > /dev/nvidia* > > There was a SLURM bug issue that made this clear, not so much in the > website docs. > > -Kevin > > > On 5/1/18, 5:28 PM, "slurm-users on behalf of R. Paul Wiegand" < > slurm-users-boun...@lists.schedmd.com on behalf of rpwi

[slurm-users] GPU / cgroup challenges

2018-05-01 Thread R. Paul Wiegand
Greetings, I am setting up our new GPU cluster, and I seem to have a problem configuring things so that the devices are properly walled off via cgroups. Our nodes each of two GPUS; however, if --gres is unset, or set to --gres=gpu:0, I can access both GPUs from inside a job. Moreover, if I ask fo