[slurm-users] Autoscaling slurm
Hi, I´m using slurm together with clustercfn autoscaling. I just have a problem and thought that you may help. When i run a script #Script.sh # /bin/bash ./myprogram --threads=5 inputfile outputfile The program uses 5 threads , assuming only 1 thread per cpu is launched it would require 5 cpus to run. Autoscaling starts by allocating 1 node with 1 cpu . Then the program starts running in one node . Then autoscaling creates more nodes so at the end you have 5 nodes available. The problem is that the program is only running on the first slave node , the nodes created later are not used. The slurm.conf file is configured dynamically so that the nodename tag is updated with all available nodes. My question is if i need to add some specific variable or configuration so that the program that need 5 cpus use all of them as they get available. Some sort of allocating cpus as they come available. I thought slurm controls how the task are send to each cpu so i was wondering if i need to add some option in the script or the slurm.conf file ? thanks,
[slurm-users] Dependencies problem with cfncluster
Hi folks, Not sure i should post this here but thought you may have seen this problem before. I´m running slurm(16.05) together with cfncluster from aws and using autoscaling. It seems to work except for dependencies. I always get an error: _sbatch: error: Batch job submission falied: Job dependency problem_ I tested with a simple script: #!/bin/sh id=`sbatch --job-name=factor9-1 --ntasks=1 --ntasks-per-core=1 --output=out.slurmout jobscript` echo "ntasks 1 jobid $id" for n in 2 4 8 16 32 64 128; do id=`sbatch --depend=afterany:$id --job-name=factor9-$n --ntasks=$n --ntasks-per-core=1 --output=$n.slurmout jobscript`; echo "ntasks $n jobid $id" done jobscript file: #! /bin/bash echo $hostname Looks like clustercfn is not aware of job dependencies. Or is it a slurm problem ? Thanks, David
[slurm-users] LAST TASK ID
Hi, I´m running a batch array script and would like to execute a command after the last task #SBATCH --array 1-10%10:1 sh myscript.R inputdir/file.${SLURM_ARRAY_TASK_ID} # Would like to run a command after the last task For exemple when i was using SGE there was something like this | if($SGE_TASK_ID == $SGE_TASK_LAST ) then| |||#||do||last-task stuff here| |endif| Can i do that with slurm ?
[slurm-users] Allocate more memory
Hi, I would like to submit a job that requires 3Go. The problem is that I have 70 nodes available each node with 2Gb memory. So the command sbatch --mem=3G will wait for ressources to become available. Can I run sbatch and tell the cluster to use the 3Go out of the 70Go available or is that a particular setup ? meaning is the memory restricted to each node ? or should i allocate two nodes so that i have 2x4Go availble ? thanks
[slurm-users] Multithreads config
*Hi,* ** *I have a single physical server with :* ** * *63 cpus (each cpu has 16 cores) * * *480Gb total memory* ** ** ** *NodeNAME= Sockets=1 CoresPerSocket=16 ThreadsPerCore=1 Procs=63 REALMEMORY=48*** ** ** ** ** ** *This configuration will not work. What is should be ?* ** *Thanks,* ** *David*
[slurm-users] Multithreads config
*Hi,* ** *I have a single physical server with :* ** * *63 cpus (each cpu has 16 cores) * * *480Gb total memory* ** ** ** *NodeNAME= Sockets=1 CoresPerSocket=16 ThreadsPerCore=1 Procs=63 REALMEMORY=48*** ** ** ** ** ** *This configuration will not work. What is should be ?* ** *Thanks,* ** *David*
Re: [slurm-users] Multithreads config
I have included in slurm.conf the following (based on web configurator). i have 64 cpus, not 63. NodeName=obelix CPUs=64 RealMemory=48 CoresPerSocket=16 ThreadsPerCore=1 state=UNKNOWN >sinfo -Nl sinfo: error: NodeNames=obelix CPUs=64 doesn't match Sockets*CoresPerSocket*ThreadsPerCore (16), resetting CPUs Fri Feb 16 16:02:22 2018 NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON obelix 1 testq* drained 64 4:16:1 48 0 1 (null) Low socket*core*thre what´s wrong ? On 16/02/2018 15:39, Benjamin Redling wrote: Am 16.02.2018 um 15:28 schrieb david martin: *I have a single physical server with :* * *64 cpus (each cpu has 16 cores) * * *480Gb total memory* *NodeNAME= Sockets=1 CoresPerSocket=16 ThreadsPerCore=1 Procs=63 REALMEMORY=48*** *This configuration will not work. What is should be ?* A proper configuration that shows basic quantities of effort went into reading the documentation. RTFM and use the configurator: https://slurm.schedmd.com/configurator.html You failed to define a nodename and apart from that just defining a node isn't enough -- you need at least a partition that uses that node... Regards, Benjamin