from:"david martin"

[slurm-users] Autoscaling slurm

2017-12-18 Thread david martin


Hi,

I´m using slurm together with clustercfn autoscaling.


I just have a problem and thought that you may help.


When i run a script


#Script.sh

# /bin/bash

./myprogram --threads=5  inputfile outputfile


The program uses 5 threads , assuming only 1 thread per cpu is launched 
it would require 5 cpus to run. Autoscaling starts by allocating 1 node 
with 1 cpu . Then the program starts running in one node . Then 
autoscaling creates more nodes so at the end you have 5 nodes available.


The problem is that the program is only running on the first slave node 
, the nodes created later are not used. The slurm.conf file is 
configured dynamically so that the nodename tag is updated with all 
available nodes.



My question is if  i need to add some specific variable or configuration 
so that the program that need 5 cpus use all of them as they get 
available. Some sort of allocating cpus as they come available.



I thought slurm controls how the task are send to each cpu so i was 
wondering if i need to add some option in the script or the slurm.conf 
file ?



thanks,

[slurm-users] Dependencies problem with cfncluster

2017-12-20 Thread david martin


Hi folks,

Not sure i should post this here but thought you may have seen this 
problem before.



I´m running slurm(16.05) together with cfncluster from aws and using 
autoscaling. It seems to work except for dependencies.



I always get an error:

_sbatch: error: Batch job submission falied: Job dependency problem_


I tested with a simple script:


#!/bin/sh

id=`sbatch --job-name=factor9-1 --ntasks=1 --ntasks-per-core=1 
--output=out.slurmout jobscript`
echo "ntasks 1 jobid $id"

for n in 2 4 8 16 32 64 128; do
id=`sbatch --depend=afterany:$id --job-name=factor9-$n --ntasks=$n 
--ntasks-per-core=1 --output=$n.slurmout jobscript`;
echo "ntasks $n jobid $id"
done


jobscript file:

#! /bin/bash

echo $hostname


Looks like clustercfn is not aware of job dependencies. Or is it a slurm 
problem ?



Thanks,


David

[slurm-users] LAST TASK ID

2018-02-06 Thread david martin


Hi,

I´m running a batch array script and would like to execute a command 
after the last task



#SBATCH --array 1-10%10:1

sh myscript.R inputdir/file.${SLURM_ARRAY_TASK_ID}

# Would like to run a command after the last task

For exemple when i was using SGE there was something like this

| if($SGE_TASK_ID == $SGE_TASK_LAST ) then|
|||#||do||last-task stuff here|
|endif|


Can i do that with slurm ?

[slurm-users] Allocate more memory

2018-02-07 Thread david martin




Hi,

I would like to submit a job that requires 3Go. The problem is that I 
have 70 nodes available each node with 2Gb memory.



So the command sbatch --mem=3G will wait for ressources to become available.


Can I run sbatch and tell the cluster to use the 3Go out of the 70Go 
available or is that a particular setup ? meaning is the memory 
restricted to each node ? or should i allocate two nodes so that i have 
2x4Go availble ?


thanks

[slurm-users] Multithreads config

2018-02-16 Thread david martin


*Hi,*

**

*I have a single physical server with :*

**

 * *63 cpus (each cpu has 16 cores) *
 * *480Gb total memory*

**

**

**

*NodeNAME= Sockets=1 CoresPerSocket=16 ThreadsPerCore=1 Procs=63 
REALMEMORY=48***


**

**

**

**

**

*This configuration will not work. What is should be ?*

**

*Thanks,*

**

*David*

[slurm-users] Multithreads config

2018-02-16 Thread david MARTIN


*Hi,*

**

*I have a single physical server with :*

**

 * *63 cpus (each cpu has 16 cores) *
 * *480Gb total memory*

**

**

**

*NodeNAME= Sockets=1 CoresPerSocket=16 ThreadsPerCore=1 Procs=63 
REALMEMORY=48***


**

**

**

**

**

*This configuration will not work. What is should be ?*

**

*Thanks,*

**

*David*

Re: [slurm-users] Multithreads config

2018-02-16 Thread david martin

I have included in slurm.conf the following (based on web configurator). 
i have 64 cpus, not 63.


NodeName=obelix CPUs=64 RealMemory=48  CoresPerSocket=16 
ThreadsPerCore=1 state=UNKNOWN


>sinfo -Nl


sinfo: error: NodeNames=obelix CPUs=64 doesn't match 
Sockets*CoresPerSocket*ThreadsPerCore (16), resetting CPUs

Fri Feb 16 16:02:22 2018

NODELIST   NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK 
WEIGHT AVAIL_FE REASON
obelix         1    testq*     drained   64   4:16:1 48     0      
1   (null) Low socket*core*thre




what´s wrong ?



On 16/02/2018 15:39, Benjamin Redling wrote:

Am 16.02.2018 um 15:28 schrieb david martin:

*I have a single physical server with :*
   * *64 cpus (each cpu has 16 cores) *
   * *480Gb total memory*

*NodeNAME= Sockets=1 CoresPerSocket=16 ThreadsPerCore=1 Procs=63
REALMEMORY=48***



*This configuration will not work. What is should be ?*

A proper configuration that shows basic quantities of effort went into
reading the documentation.

RTFM and use the configurator:
https://slurm.schedmd.com/configurator.html

You failed to define a nodename and apart from that just defining a node
isn't enough -- you need at least a partition that uses that node...

Regards,
Benjamin

[slurm-users] Autoscaling slurm

[slurm-users] Dependencies problem with cfncluster

[slurm-users] LAST TASK ID

[slurm-users] Allocate more memory

[slurm-users] Multithreads config

[slurm-users] Multithreads config

Re: [slurm-users] Multithreads config

7 matches

Site Navigation

Mail list logo

Footer information