Re: [slurm-users] sbatch sending the working directory from the controller to the node

2020-01-21 Thread Dean Schulze
So there is a --chdir for sbatch too. This implies that the same path has to exist on all nodes. Something to keep in mind when creating a slurm cluster. On Tue, Jan 21, 2020 at 12:58 PM William Brown wrote: > The srun man page says: > > > > When initiating remote processes *srun* will propaga

Re: [slurm-users] sbatch sending the working directory from the controller to the node

2020-01-21 Thread William Brown
The srun man page says: When initiating remote processes srun will propagate the current working directory, unless --chdir= is specified, in which case path will become the working directory for the remote processes. William From: slurm-users On Behalf Of Dean Schulze Sent: 21 Janua

[slurm-users] sbatch sending the working directory from the controller to the node

2020-01-21 Thread Dean Schulze
I run this sbatch script from the controller: === #!/bin/bash #SBATCH --job-name=test_job #SBATCH --mail-type=NONE# Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH --ntasks=1 #SBATCH --mem=1gb #SBATCH --time=00:05:00 # Time limit hrs:min:sec #SBATCH --output=test_job_

Re: [slurm-users] Node can't run simple job when STATUS is up and STATE is idle

2020-01-21 Thread Dean Schulze
Thank you, thank you, thank you. It was the firewall on CentOS 7. Once I disabled that it worked. For anyone else who runs into this issue here is how to disable the firewall on CentOS 7: https://linuxize.com/post/how-to-stop-and-disable-firewalld-on-centos-7/ On Tue, Jan 21, 2020 at 7:24 AM

Re: [slurm-users] Node node00x has low real_memory size & slurm_rpc_node_registration node=node003: Invalid argument

2020-01-21 Thread Robert Kudyba
> > > are you sure, your 24 core nodes have 187 TERABYTES memory? > > As you yourself cited: > > Size of real memory on the node in megabytes > > The settings in your slurm.conf: > > NodeName=node[001-003] CoresPerSocket=12 RealMemory=196489092 Sockets=2 > Gres=gpu:1 > > so, your machines should h

Re: [slurm-users] Node can't run simple job when STATUS is up and STATE is idle

2020-01-21 Thread Brian Johanson
On 1/21/2020 12:32 AM, Chris Samuel wrote: On 20/1/20 3:00 pm, Dean Schulze wrote: There's either a problem with the source code I cloned from github, or there is a problem when the controller runs on Ubuntu 19 and the node runs on CentOS 7.7. I'm downgrading to a stable 19.05 build to see