from:"david vilanova"

[slurm-users] failed to open persistant connection to localhost:6819

2017-11-28 Thread david vilanova

Hello,
Don´t understand why i can´t connect to the controller. This is a new fresh
install using slurmdbd in ubuntu 16.04

Seems like a persistant connection to mysql cannot be made ???



*slurmctl.log:*
[2017-11-27T20:22:48.056] Job accounting information stored, but details
not gathered
[2017-11-27T20:22:48.059] slurmctld version 17.02.9 started on cluster
cluster
[2017-11-27T20:22:48.090] error: slurm_persist_conn_open_without_init:
failed to open persistant connection to localhost:6819: Connection re
fused
[2017-11-27T20:22:48.090] error: slurmdbd: Sending PersistInit msg:
Connection refused
[2017-11-27T20:22:48.090] error: slurm_persist_conn_open_without_init:
failed to open persistant connection to localhost:6819: Connection re
fused
[2017-11-27T20:22:48.090] error: slurmdbd: Sending PersistInit msg:
Connection refused
[2017-11-27T20:22:48.098] layouts: no layout to initialize
[2017-11-27T20:22:48.107] error: mysql_real_connect failed: 2002 Can't
connect to local MySQL server through socket '/var/run/mysqld/mysqld.
sock' (2)
[2017-11-27T20:22:48.107] fatal: You haven't inited this storage yet.

[slurm-users] slurm conf with single machine with multi cores.

2017-11-29 Thread david vilanova

Hello,
I have installed latest 7.11 release and my node is shown as down.
I hava a single physical server with 12 cores so not sure the conf below is
correct ?? can you help ??

In slurm.conf the node is configure as follows:

NodeName=linuxcluster CPUs=1 RealMemory=991 Sockets=12 CoresPerSocket=1
ThreadsPerCore=1 Feature=local
PartitionName=testq Nodes=inuxcluster Default=YES MaxTime=INFINITE State=UP


Output form sinfo:
ubuntu@obione:~$ sinfo

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
testq*   up   infinite  1  down* linuxcluster



Thanks,

Re: [slurm-users] slurm conf with single machine with multi cores.

2017-11-29 Thread david vilanova

Hi,
I have updated the slurm.conf as follows:

SelectType=select/cons_res
SelectTypeParameters=CR_CPU
NodeName=linuxcluster CPUs=2
PartitionName=testq Nodes=linuxcluster Default=YES MaxTime=INFINITE State=UP

Still get testq node in down status ??? Any idea ?

Below log from db and controller:
==> /var/log/slurm/slurmctrl.log <==
[2017-11-29T16:28:30.446] slurmctld version 17.11.0 started on cluster
linuxcluster
[2017-11-29T16:28:30.850] error: SelectType specified more than once,
latest value used
[2017-11-29T16:28:30.851] layouts: no layout to initialize
[2017-11-29T16:28:30.855] layouts: loading entities/relations information
[2017-11-29T16:28:30.855] Recovered state of 1 nodes
[2017-11-29T16:28:30.855] Down nodes: linuxcluster
[2017-11-29T16:28:30.855] Recovered information about 0 jobs
[2017-11-29T16:28:30.855] cons_res: select_p_node_init
[2017-11-29T16:28:30.855] cons_res: preparing for 1 partitions
[2017-11-29T16:28:30.856] Recovered state of 0 reservations
[2017-11-29T16:28:30.856] _preserve_plugins: backup_controller not specified
[2017-11-29T16:28:30.856] cons_res: select_p_reconfigure
[2017-11-29T16:28:30.856] cons_res: select_p_node_init
[2017-11-29T16:28:30.856] cons_res: preparing for 1 partitions
[2017-11-29T16:28:30.856] Running as primary controller
[2017-11-29T16:28:30.856] Registering slurmctld at port 6817 with slurmdbd.
[2017-11-29T16:28:31.098] No parameter for mcs plugin, default values set
[2017-11-29T16:28:31.098] mcs: MCSParameters = (null). ondemand set.
[2017-11-29T16:29:31.169]
SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_depth=0,sched_max_job_start=0,sched_min_interval=2

David

El El mié, 29 nov 2017 a las 15:59, Steffen Grunewald <
steffen.grunew...@aei.mpg.de> escribió:

> Hi David,
>
> On Wed, 2017-11-29 at 14:45:06 +, david vilanova wrote:
> > Hello,
> > I have installed latest 7.11 release and my node is shown as down.
> > I hava a single physical server with 12 cores so not sure the conf below
> is
> > correct ?? can you help ??
> >
> > In slurm.conf the node is configure as follows:
> >
> > NodeName=linuxcluster CPUs=1 RealMemory=991 Sockets=12 CoresPerSocket=1
> > ThreadsPerCore=1 Feature=local
>
> 12 Sockets? Certainly not... 12 Cores per socket, yes.
> (IIRC CPUS shouldn't be specified if the detailed topology is given.
> You may try CPUs=12 and drop the details.)
>
> > PartitionName=testq Nodes=inuxcluster Default=YES MaxTime=INFINITE
> State=UP
>^^ typo?
>
> Cheers,
>  Steffen
>
>

Re: [slurm-users] slurm conf with single machine with multi cores.

2017-11-30 Thread david vilanova

ondemand set.
[2017-11-30T09:24:28.758] debug:  mcs none plugin loaded
[2017-11-30T09:24:28.758] debug:  power_save mode not enabled
[2017-11-30T09:24:31.761] debug:  Spawning registration agent for
linuxcluster1 hosts
[2017-11-30T09:24:41.764] agent/is_node_resp: node:linuxcluster
RPC:REQUEST_NODE_REGISTRATION_STATUS : Communication connection failure
[2017-11-30T09:24:58.435] debug:  backfill: beginning
[2017-11-30T09:24:58.435] debug:  backfill: no jobs to backfill
[2017-11-30T09:25:28.435] debug:  backfill: beginning
[2017-11-30T09:25:28.436] debug:  backfill: no jobs to backfill
[2017-11-30T09:25:28.830]
SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_depth=0,sched_max_job_sta
rt=0,sched_min_interval=2
[2017-11-30T09:25:28.830] debug:  sched: Running job scheduler
[2017-11-30T09:25:58.436] debug:  backfill: beginning
[2017-11-30T09:25:58.436] debug:  backfill: no jobs to backfill


ps -ef | grep slurm
ubuntu@linuxcluster:/home/dvi/$ ps -ef | grep slurm
slurm11388 1  0 09:24 ?00:00:00 /usr/local/sbin/slurmdbd
slurm11430 1  0 09:24 ?00:00:00 /usr/local/sbin/slurmctld

Any idea ?





El El mié, 29 nov 2017 a las 18:21, Le Biot, Pierre-Marie <
pierre-marie.leb...@hpe.com> escribió:

> Hello David,
>
>
>
> So linuxcluster is the Head node and also a Compute node ?
>
>
>
> Is slurmd running ?
>
>
>
> What does /var/log/slurm/slurmd.log say ?
>
>
>
> Regards,
>
> Pierre-Marie Le Biot
>
>
>
>
>
> *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On
> Behalf Of *david vilanova
> *Sent:* Wednesday, November 29, 2017 4:33 PM
> *To:* Slurm User Community List 
> *Subject:* Re: [slurm-users] slurm conf with single machine with multi
> cores.
>
>
>
> Hi,
>
> I have updated the slurm.conf as follows:
>
>
> SelectType=select/cons_res
>
> SelectTypeParameters=CR_CPU
>
> NodeName=linuxcluster CPUs=2
>
> PartitionName=testq Nodes=linuxcluster Default=YES MaxTime=INFINITE
> State=UP
>
>
>
> Still get testq node in down status ??? Any idea ?
>
>
>
> Below log from db and controller:
>
> ==> /var/log/slurm/slurmctrl.log <==
>
> [2017-11-29T16:28:30.446] slurmctld version 17.11.0 started on cluster
> linuxcluster
>
> [2017-11-29T16:28:30.850] error: SelectType specified more than once,
> latest value used
>
> [2017-11-29T16:28:30.851] layouts: no layout to initialize
>
> [2017-11-29T16:28:30.855] layouts: loading entities/relations information
>
> [2017-11-29T16:28:30.855] Recovered state of 1 nodes
>
> [2017-11-29T16:28:30.855] Down nodes: linuxcluster
>
> [2017-11-29T16:28:30.855] Recovered information about 0 jobs
>
> [2017-11-29T16:28:30.855] cons_res: select_p_node_init
>
> [2017-11-29T16:28:30.855] cons_res: preparing for 1 partitions
>
> [2017-11-29T16:28:30.856] Recovered state of 0 reservations
>
> [2017-11-29T16:28:30.856] _preserve_plugins: backup_controller not
> specified
>
> [2017-11-29T16:28:30.856] cons_res: select_p_reconfigure
>
> [2017-11-29T16:28:30.856] cons_res: select_p_node_init
>
> [2017-11-29T16:28:30.856] cons_res: preparing for 1 partitions
>
> [2017-11-29T16:28:30.856] Running as primary controller
>
> [2017-11-29T16:28:30.856] Registering slurmctld at port 6817 with slurmdbd.
>
> [2017-11-29T16:28:31.098] No parameter for mcs plugin, default values set
>
> [2017-11-29T16:28:31.098] mcs: MCSParameters = (null). ondemand set.
>
> [2017-11-29T16:29:31.169]
> SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_depth=0,sched_max_job_start=0,sched_min_interval=2
>
>
>
> David
>
>
>
>
>
>
>
> El El mié, 29 nov 2017 a las 15:59, Steffen Grunewald <
> steffen.grunew...@aei.mpg.de> escribió:
>
> Hi David,
>
> On Wed, 2017-11-29 at 14:45:06 +, david vilanova wrote:
> > Hello,
> > I have installed latest 7.11 release and my node is shown as down.
> > I hava a single physical server with 12 cores so not sure the conf below
> is
> > correct ?? can you help ??
> >
> > In slurm.conf the node is configure as follows:
> >
> > NodeName=linuxcluster CPUs=1 RealMemory=991 Sockets=12 CoresPerSocket=1
> > ThreadsPerCore=1 Feature=local
>
> 12 Sockets? Certainly not... 12 Cores per socket, yes.
> (IIRC CPUS shouldn't be specified if the detailed topology is given.
> You may try CPUs=12 and drop the details.)
>
> > PartitionName=testq Nodes=inuxcluster Default=YES MaxTime=INFINITE
> State=UP
>^^ typo?
>
> Cheers,
>  Steffen
>
>

Re: [slurm-users] "command not found"

2017-12-15 Thread david vilanova

Thanks manuel,
The shared folder between master and slave sounds like a good option. I’ll
go and try that one,
Thanks

El El vie, 15 dic 2017 a las 12:36, Manuel Rodríguez Pascual <
manuel.rodriguez.pasc...@gmail.com> escribió:

> Hi David,
>
> The command to be executed must be present on the node where the script is
> run.  When you submit a job in Slurm only the script is copied to the slave
> node, so the data and binaries must be there prior to the script execution.
>
> The are many alternatives to deal with this situation depending on the
> system and software requirements. A common practice (as far as I know, I am
> not an experienced sysadmin) is to install the software in a shared storage
> and mount it both on the master node and in the slaves. This is analog to
> what you are probably doing with the user $HOME.
>
> Of course this cannot always be done, and sometimes the only solution is
> install a particular library or application in every node. In this case
> there are many solutions to automatice the process.
>
> Cheers,
>
> Manuel
>
> 2017-12-15 12:21 GMT+01:00 david :
>
>> Hi,
>>
>> when running a sbatch script i get "command not found".
>>
>>
>> The command is blast (quite used bioinformatics tool).
>>
>>
>> The problem comes from the fact that the blast binary is installed in the
>> master node  but not on the other nodes. When the job runs on another node
>> the binary is not found.
>>
>>
>> What would be way to deal with this situation ? what is common practice ?
>>
>>
>> thanks,
>>
>> david
>>
>>
>>
>>
>

Re: [slurm-users] Allocate more memory

2018-02-07 Thread david vilanova


Thanks for the quick response.

Should the following script do the trick ?? meaning use all required 
nodes to have at least 3G total memory ? even though my nodes were setup 
with 2G each ??


#SBATCH array 1-10%10:1

#SBATCH mem-per-cpu=3000m

srun R CMD BATCH myscript.R



thanks




On 07/02/2018 15:50, Loris Bennett wrote:

Hi David,

david martin  writes:




Hi,

I would like to submit a job that requires 3Go. The problem is that I have 70 
nodes available each node with 2Gb memory.

So the command sbatch --mem=3G will wait for ressources to become available.

Can I run sbatch and tell the cluster to use the 3Go out of the 70Go
available or is that a particular setup ? meaning is the memory
restricted to each node ? or should i allocate two nodes so that i
have 2x4Go availble ?

Check

   man sbatch

You'll find that --mem means memory per node.  Thus, if you specify 3GB
but all the nodes have 2GB, your job will wait forever (or until you buy
more RAM and reconfigure Slurm).

You probably want --mem-per-cpu, which is actually more like memory per
task.  This is obviously only going to work if your job can actually run
on more than one node, e.g. is MPI enabled.

Cheers,

Loris

Re: [slurm-users] Allocate more memory

2018-02-07 Thread david vilanova

Yes, when working with the human genome you can easily go up to 16Gb.

El El mié, 7 feb 2018 a las 16:20, Krieger, Donald N. 
escribió:

> Sorry for jumping in without full knowledge of the thread.
> But it sounds like the key issue is that each job requires 3 GBytes.
> Even if that's true, won't jobs start on cores with less memory and then
> just page?
> Of course as the previous post states, you must tailor your slurm request
> to the physical limits of your cluster.
>
> But the real question is do the jobs really require 3 GBytes of resident
> memory.
> Most code declares far more than required and then ends up running in what
> it actually uses.
> You can tell by running a job and viewing the memory statistics with top
> or something similar.
>
> Anyway - best - Don
>
> -Original Message-
> From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On
> Behalf Of r...@open-mpi.org
> Sent: Wednesday, February 7, 2018 10:03 AM
> To: Slurm User Community List 
> Subject: Re: [slurm-users] Allocate more memory
>
> Afraid not - since you don’t have any nodes that meet the 3G requirement,
> you’ll just hang.
>
> > On Feb 7, 2018, at 7:01 AM, david vilanova  wrote:
> >
> > Thanks for the quick response.
> >
> > Should the following script do the trick ?? meaning use all required
> nodes to have at least 3G total memory ? even though my nodes were setup
> with 2G each ??
> >
> > #SBATCH array 1-10%10:1
> >
> > #SBATCH mem-per-cpu=3000m
> >
> > srun R CMD BATCH myscript.R
> >
> >
> >
> > thanks
> >
> >
> >
> >
> > On 07/02/2018 15:50, Loris Bennett wrote:
> >> Hi David,
> >>
> >> david martin  writes:
> >>
> >>> 
> >>>
> >>> Hi,
> >>>
> >>> I would like to submit a job that requires 3Go. The problem is that I
> have 70 nodes available each node with 2Gb memory.
> >>>
> >>> So the command sbatch --mem=3G will wait for ressources to become
> available.
> >>>
> >>> Can I run sbatch and tell the cluster to use the 3Go out of the 70Go
> >>> available or is that a particular setup ? meaning is the memory
> >>> restricted to each node ? or should i allocate two nodes so that i
> >>> have 2x4Go availble ?
> >> Check
> >>
> >>   man sbatch
> >>
> >> You'll find that --mem means memory per node.  Thus, if you specify
> >> 3GB but all the nodes have 2GB, your job will wait forever (or until
> >> you buy more RAM and reconfigure Slurm).
> >>
> >> You probably want --mem-per-cpu, which is actually more like memory
> >> per task.  This is obviously only going to work if your job can
> >> actually run on more than one node, e.g. is MPI enabled.
> >>
> >> Cheers,
> >>
> >> Loris
> >>
> >
> >
>
>
>

Re: [slurm-users] Allocate more memory

2018-02-07 Thread david vilanova

Thanks all for your comments, i will look into that

El El mié, 7 feb 2018 a las 16:37, Loris Bennett 
escribió:

>
> I was make the unwarranted assumption that you have multiple processes.
> So if you have a single process which needs more than 2GB, Ralph is of
> course right and there is nothing you can do.
>
> However, you are using R, so, depending on your problem, you may be able
> to make use of a package like Rmpi to allow your job to run on multiple
> nodes.
>
> Cheers,
>
> Loris
>
> "r...@open-mpi.org"  writes:
>
> > Afraid not - since you don’t have any nodes that meet the 3G
> requirement, you’ll just hang.
> >
> >> On Feb 7, 2018, at 7:01 AM, david vilanova  wrote:
> >>
> >> Thanks for the quick response.
> >>
> >> Should the following script do the trick ?? meaning use all required
> nodes to have at least 3G total memory ? even though my nodes were setup
> with 2G each ??
> >>
> >> #SBATCH array 1-10%10:1
> >>
> >> #SBATCH mem-per-cpu=3000m
> >>
> >> srun R CMD BATCH myscript.R
> >>
> >>
> >>
> >> thanks
> >>
> >>
> >>
> >>
> >> On 07/02/2018 15:50, Loris Bennett wrote:
> >>> Hi David,
> >>>
> >>> david martin  writes:
> >>>
> >>>> 
> >>>>
> >>>> Hi,
> >>>>
> >>>> I would like to submit a job that requires 3Go. The problem is that I
> have 70 nodes available each node with 2Gb memory.
> >>>>
> >>>> So the command sbatch --mem=3G will wait for ressources to become
> available.
> >>>>
> >>>> Can I run sbatch and tell the cluster to use the 3Go out of the 70Go
> >>>> available or is that a particular setup ? meaning is the memory
> >>>> restricted to each node ? or should i allocate two nodes so that i
> >>>> have 2x4Go availble ?
> >>> Check
> >>>
> >>>   man sbatch
> >>>
> >>> You'll find that --mem means memory per node.  Thus, if you specify 3GB
> >>> but all the nodes have 2GB, your job will wait forever (or until you
> buy
> >>> more RAM and reconfigure Slurm).
> >>>
> >>> You probably want --mem-per-cpu, which is actually more like memory per
> >>> task.  This is obviously only going to work if your job can actually
> run
> >>> on more than one node, e.g. is MPI enabled.
> >>>
> >>> Cheers,
> >>>
> >>> Loris
> >>>
> >>
> >>
> --
> Dr. Loris Bennett (Mr.)
> ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
>
>

[slurm-users] failed to open persistant connection to localhost:6819

[slurm-users] slurm conf with single machine with multi cores.

Re: [slurm-users] slurm conf with single machine with multi cores.

Re: [slurm-users] slurm conf with single machine with multi cores.

Re: [slurm-users] "command not found"

Re: [slurm-users] Allocate more memory

Re: [slurm-users] Allocate more memory

Re: [slurm-users] Allocate more memory

8 matches

Site Navigation

Mail list logo

Footer information