Re: [slurm-users] SLURM_JOB_GPU not set in salloc

2019-01-18 Thread Chris Samuel
On 18/1/19 3:18 am, Henkel wrote: we just found that SLURM_JOB_GPU (and all the other gres/gpu variables) are not set after salloc has allocated nodes. Only if srun or sbatch is used the environment variables are set. I'm wondering if that's only happening to us or if this is always the case or

Re: [slurm-users] SlurmDBD setup with mysql

2019-01-18 Thread Sajesh Singh
Fixed the problem. I had an incorrect config the slurm.conf needed the following entry and all now works as expected: AccountingStoragePort=7031 -- -SS- -Original Message- From: Sajesh Singh Sent: Thursday, January 17, 2019 12:26 PM To: Slurm User Community List Subject: RE: [slurm-

Re: [slurm-users] Topology configuration questions:

2019-01-18 Thread Ryan Novosielski
The documentation indicates you need it everywhere: https://slurm.schedmd.com/topology.conf.html "Changes to the configuration file take effect upon restart of Slurm daemons, daemon receipt of the SIGHUP signal, or execution of the command "scontrol reconfigure" unless otherwise noted." I have

Re: [slurm-users] Topology configuration questions:

2019-01-18 Thread Ryan Novosielski
> On Jan 18, 2019, at 11:53 AM, Kilian Cavalotti > wrote: > > On Fri, Jan 18, 2019 at 6:31 AM Prentice Bisbal wrote: >>> Note that if you care about node weights (eg. NodeName=whatever001 >>> Weight=2, etc. in slurm.conf), using the topology function will disable it. >>> I believe I was pro

Re: [slurm-users] RFC: Slurm Tool to Automate and Track Large Job Arrays

2019-01-18 Thread Alex Chekholko
Almost every place I worked built some site-specific tools for managing jobs that some people found very useful. E.g. https://github.com/StanfordBioinformatics/SJM http://clusterjob.org/ There have also been some efforts to standardize this sort of thing: https://www.commonwl.org/ I have not use

[slurm-users] Help with PMIx, Slurm, Intel MPI

2019-01-18 Thread Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]
All, This is probably going to be a very basic question, but I find the need to ask. Recently the cluster I use installed UCX and PMIx, which is nice. Now I'm currently trying to build a stack of Open MPI 4.0.0 with the ability to see those, but until then I thought I'd try Intel MPI based on

[slurm-users] RFC: Slurm Tool to Automate and Track Large Job Arrays

2019-01-18 Thread Erik Surface
Hi, I am a slurm end-user needing to run ~250k jobs, each takes ~2-4 hrs. With the traffic on our cluster and a limit of 7000 job submissions at a time, it will take about a month to run the full set, if we are lucky. I built a generic tool (in bash, currently) that automates the tracking and subm

Re: [slurm-users] Topology configuration questions:

2019-01-18 Thread Kilian Cavalotti
On Fri, Jan 18, 2019 at 6:31 AM Prentice Bisbal wrote: > > Note that if you care about node weights (eg. NodeName=whatever001 > > Weight=2, etc. in slurm.conf), using the topology function will disable it. > > I believe I was promised a warning about that in the future in a > > conversation wit

Re: [slurm-users] Topology configuration questions:

2019-01-18 Thread Prentice Bisbal
On 01/17/2019 07:55 PM, Fulcomer, Samuel wrote: We use topology.conf to segregate architectures (Sandy->Skylake), and also to isolate individual nodes with 1Gb/s Ethernet rather than IB (older GPU nodes with deprecated IB cards). In the latter case, topology.conf had a switch entry for each no

Re: [slurm-users] Topology configuration questions:

2019-01-18 Thread Prentice Bisbal
On 01/17/2019 06:36 PM, Ryan Novosielski wrote: I don’t actually know the answer to this one, but we have it provisioned to all nodes. Note that if you care about node weights (eg. NodeName=whatever001 Weight=2, etc. in slurm.conf), using the topology function will disable it. I believe I

[slurm-users] SLURM_JOB_GPU not set in salloc

2019-01-18 Thread Henkel
Hi all, we just found that SLURM_JOB_GPU (and all the other gres/gpu variables) are not set after salloc has allocated nodes. Only if srun or sbatch is used the environment variables are set. I'm wondering if that's only happening to us or if this is always the case or could be configured to behav

[slurm-users] How can I tell SLURM PowerUP to boot GPU Image

2019-01-18 Thread J.R. W
Hello, I have an AWS burst cluster and I was wondering if there was any new information on if it’s possible to conditionally run different ResumeScripts or pass them an argument. I see from PoweSave documentation , the end calls for conditional AWS im