Re: [slurm-users] Issue with x11

2019-05-14 Thread Mahmood Naderan
>No, but you'll need to logout of rocks7 and ssh back into it. >Are you physically logged into rocks7? Or are you connecting via SSH? $DISPLAY = :1 kind of means that you are physically logged into the machine I am connecting through a vnc session. Right now, I have access to the desktop of the f

[slurm-users] Nodes not responding... how does slurm track it?

2019-05-14 Thread Bill Broadley
My latest addition to a cluster results in a group of the same nodes periodically getting listed as "not-responding" and usually (but not always) recovering. I increased logging up to debug3 and see messages like: [2019-05-14T17:09:25.247] debug: Spawning ping agent for bigmem[1-9],bm[1,7,9-13

Re: [slurm-users] Issue with x11

2019-05-14 Thread Sean Crosby
Hi Mahmood, Are you physically logged into rocks7? Or are you connecting via SSH? $DISPLAY = :1 kind of means that you are physically logged into the machine Sean -- Sean Crosby Senior DevOpsHPC Engineer and HPC Team Lead | Research Platform Services Research Computing | CoEPP | School of Physi

Re: [slurm-users] Issue with x11

2019-05-14 Thread Christopher Samuel
On 5/14/19 5:09 PM, Mahmood Naderan wrote: Should I modify that parameter on compute-0-0 too? No, but you'll need to logout of rocks7 and ssh back into it. Or are you on the console of the system itself? -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Issue with x11

2019-05-14 Thread Mahmood Naderan
>What does this say? >echo $DISPLAY On frontend of compute-0-0? [mahmood@rocks7 ~]$ echo $DISPLAY :1 >To get native X11 working with SLURM, we had to add this config to sshd_config on the login node (your rocks7 host) >X11UseLocalhost no >You'll then need to restart sshd I checked that and it

Re: [slurm-users] Issue with x11

2019-05-14 Thread Sean Crosby
Hi Mahmood, To get native X11 working with SLURM, we had to add this config to sshd_config on the login node (your rocks7 host) X11UseLocalhost no You'll then need to restart sshd Sean -- Sean Crosby Senior DevOpsHPC Engineer and HPC Team Lead | Research Platform Services Research Computing |

Re: [slurm-users] Issue with x11

2019-05-14 Thread Christopher Samuel
On 5/14/19 4:00 PM, Mahmood Naderan wrote: srun: error: Cannot forward to local display. Can only use X11 forwarding with network displays. What does this say? echo $DISPLAY All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

[slurm-users] Issue with x11

2019-05-14 Thread Mahmood Naderan
Hi I think I have asked this question before, but wasn't able to fix that. While "xclock" command works by "ssh -Y", srun with x11 option fails to opens xclock. [mahmood@rocks7 ~]$ srun --x11 --nodelist=compute-0-0 --account y4 --partition RUBY -n 1 -c 4 --mem=1GB xclock srun: error: Cannot forwa

[slurm-users] Call for Abstracts - 2019 Slurm User Group Meeting

2019-05-14 Thread Jacob Jenson
You are invited to submit an abstract of a tutorial, technical presentation or site report to be given at the 2019 Slurm User Group Meeting. This event is sponsored and organized by the University of Utah and SchedMD. This international event is opened to those who want to: - Learn more about S

[slurm-users] Registration for 2019 Slurm User Group Meeting is Open

2019-05-14 Thread Jacob Jenson
Registration for the 2019 Slurm User Group Meeting is open. You can register at https://slug19.eventbrite.com/ The meeting will be held on 17-18 September 2019 in Salt Lake City at the University of Utah - *Early registration* - May 14 through July 14 - $300 USD - *Standard regi

Re: [slurm-users] Regression with srun and task/affinity

2019-05-14 Thread Jason Bacon
On 2019-05-14 09:24, Jason Bacon wrote: On 2018-12-16 09:02, Jason Bacon wrote: Good morning, We've been running 17.02.11 for a long time and upon testing an upgrade to the 18 series, we discovered a regression.  It appeared somewhere between 17.02.11 and 17.11.7. Everything works fine und

Re: [slurm-users] Regression with srun and task/affinity

2019-05-14 Thread Jason Bacon
On 2018-12-16 09:02, Jason Bacon wrote: Good morning, We've been running 17.02.11 for a long time and upon testing an upgrade to the 18 series, we discovered a regression.  It appeared somewhere between 17.02.11 and 17.11.7. Everything works fine under 17.02.11. Under later versions, every