[slurm-users] failed to open persistant connection to localhost:6819
Hello, Don´t understand why i can´t connect to the controller. This is a new fresh install using slurmdbd in ubuntu 16.04 Seems like a persistant connection to mysql cannot be made ??? *slurmctl.log:* [2017-11-27T20:22:48.056] Job accounting information stored, but details not gathered [2017-11-27T20:22:48.059] slurmctld version 17.02.9 started on cluster cluster [2017-11-27T20:22:48.090] error: slurm_persist_conn_open_without_init: failed to open persistant connection to localhost:6819: Connection re fused [2017-11-27T20:22:48.090] error: slurmdbd: Sending PersistInit msg: Connection refused [2017-11-27T20:22:48.090] error: slurm_persist_conn_open_without_init: failed to open persistant connection to localhost:6819: Connection re fused [2017-11-27T20:22:48.090] error: slurmdbd: Sending PersistInit msg: Connection refused [2017-11-27T20:22:48.098] layouts: no layout to initialize [2017-11-27T20:22:48.107] error: mysql_real_connect failed: 2002 Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld. sock' (2) [2017-11-27T20:22:48.107] fatal: You haven't inited this storage yet.
[slurm-users] slurm conf with single machine with multi cores.
Hello, I have installed latest 7.11 release and my node is shown as down. I hava a single physical server with 12 cores so not sure the conf below is correct ?? can you help ?? In slurm.conf the node is configure as follows: NodeName=linuxcluster CPUs=1 RealMemory=991 Sockets=12 CoresPerSocket=1 ThreadsPerCore=1 Feature=local PartitionName=testq Nodes=inuxcluster Default=YES MaxTime=INFINITE State=UP Output form sinfo: ubuntu@obione:~$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST testq* up infinite 1 down* linuxcluster Thanks,
Re: [slurm-users] slurm conf with single machine with multi cores.
Hi, I have updated the slurm.conf as follows: SelectType=select/cons_res SelectTypeParameters=CR_CPU NodeName=linuxcluster CPUs=2 PartitionName=testq Nodes=linuxcluster Default=YES MaxTime=INFINITE State=UP Still get testq node in down status ??? Any idea ? Below log from db and controller: ==> /var/log/slurm/slurmctrl.log <== [2017-11-29T16:28:30.446] slurmctld version 17.11.0 started on cluster linuxcluster [2017-11-29T16:28:30.850] error: SelectType specified more than once, latest value used [2017-11-29T16:28:30.851] layouts: no layout to initialize [2017-11-29T16:28:30.855] layouts: loading entities/relations information [2017-11-29T16:28:30.855] Recovered state of 1 nodes [2017-11-29T16:28:30.855] Down nodes: linuxcluster [2017-11-29T16:28:30.855] Recovered information about 0 jobs [2017-11-29T16:28:30.855] cons_res: select_p_node_init [2017-11-29T16:28:30.855] cons_res: preparing for 1 partitions [2017-11-29T16:28:30.856] Recovered state of 0 reservations [2017-11-29T16:28:30.856] _preserve_plugins: backup_controller not specified [2017-11-29T16:28:30.856] cons_res: select_p_reconfigure [2017-11-29T16:28:30.856] cons_res: select_p_node_init [2017-11-29T16:28:30.856] cons_res: preparing for 1 partitions [2017-11-29T16:28:30.856] Running as primary controller [2017-11-29T16:28:30.856] Registering slurmctld at port 6817 with slurmdbd. [2017-11-29T16:28:31.098] No parameter for mcs plugin, default values set [2017-11-29T16:28:31.098] mcs: MCSParameters = (null). ondemand set. [2017-11-29T16:29:31.169] SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_depth=0,sched_max_job_start=0,sched_min_interval=2 David El El mié, 29 nov 2017 a las 15:59, Steffen Grunewald < steffen.grunew...@aei.mpg.de> escribió: > Hi David, > > On Wed, 2017-11-29 at 14:45:06 +, david vilanova wrote: > > Hello, > > I have installed latest 7.11 release and my node is shown as down. > > I hava a single physical server with 12 cores so not sure the conf below > is > > correct ?? can you help ?? > > > > In slurm.conf the node is configure as follows: > > > > NodeName=linuxcluster CPUs=1 RealMemory=991 Sockets=12 CoresPerSocket=1 > > ThreadsPerCore=1 Feature=local > > 12 Sockets? Certainly not... 12 Cores per socket, yes. > (IIRC CPUS shouldn't be specified if the detailed topology is given. > You may try CPUs=12 and drop the details.) > > > PartitionName=testq Nodes=inuxcluster Default=YES MaxTime=INFINITE > State=UP >^^ typo? > > Cheers, > Steffen > >
Re: [slurm-users] slurm conf with single machine with multi cores.
ondemand set. [2017-11-30T09:24:28.758] debug: mcs none plugin loaded [2017-11-30T09:24:28.758] debug: power_save mode not enabled [2017-11-30T09:24:31.761] debug: Spawning registration agent for linuxcluster1 hosts [2017-11-30T09:24:41.764] agent/is_node_resp: node:linuxcluster RPC:REQUEST_NODE_REGISTRATION_STATUS : Communication connection failure [2017-11-30T09:24:58.435] debug: backfill: beginning [2017-11-30T09:24:58.435] debug: backfill: no jobs to backfill [2017-11-30T09:25:28.435] debug: backfill: beginning [2017-11-30T09:25:28.436] debug: backfill: no jobs to backfill [2017-11-30T09:25:28.830] SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_depth=0,sched_max_job_sta rt=0,sched_min_interval=2 [2017-11-30T09:25:28.830] debug: sched: Running job scheduler [2017-11-30T09:25:58.436] debug: backfill: beginning [2017-11-30T09:25:58.436] debug: backfill: no jobs to backfill ps -ef | grep slurm ubuntu@linuxcluster:/home/dvi/$ ps -ef | grep slurm slurm11388 1 0 09:24 ?00:00:00 /usr/local/sbin/slurmdbd slurm11430 1 0 09:24 ?00:00:00 /usr/local/sbin/slurmctld Any idea ? El El mié, 29 nov 2017 a las 18:21, Le Biot, Pierre-Marie < pierre-marie.leb...@hpe.com> escribió: > Hello David, > > > > So linuxcluster is the Head node and also a Compute node ? > > > > Is slurmd running ? > > > > What does /var/log/slurm/slurmd.log say ? > > > > Regards, > > Pierre-Marie Le Biot > > > > > > *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On > Behalf Of *david vilanova > *Sent:* Wednesday, November 29, 2017 4:33 PM > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] slurm conf with single machine with multi > cores. > > > > Hi, > > I have updated the slurm.conf as follows: > > > SelectType=select/cons_res > > SelectTypeParameters=CR_CPU > > NodeName=linuxcluster CPUs=2 > > PartitionName=testq Nodes=linuxcluster Default=YES MaxTime=INFINITE > State=UP > > > > Still get testq node in down status ??? Any idea ? > > > > Below log from db and controller: > > ==> /var/log/slurm/slurmctrl.log <== > > [2017-11-29T16:28:30.446] slurmctld version 17.11.0 started on cluster > linuxcluster > > [2017-11-29T16:28:30.850] error: SelectType specified more than once, > latest value used > > [2017-11-29T16:28:30.851] layouts: no layout to initialize > > [2017-11-29T16:28:30.855] layouts: loading entities/relations information > > [2017-11-29T16:28:30.855] Recovered state of 1 nodes > > [2017-11-29T16:28:30.855] Down nodes: linuxcluster > > [2017-11-29T16:28:30.855] Recovered information about 0 jobs > > [2017-11-29T16:28:30.855] cons_res: select_p_node_init > > [2017-11-29T16:28:30.855] cons_res: preparing for 1 partitions > > [2017-11-29T16:28:30.856] Recovered state of 0 reservations > > [2017-11-29T16:28:30.856] _preserve_plugins: backup_controller not > specified > > [2017-11-29T16:28:30.856] cons_res: select_p_reconfigure > > [2017-11-29T16:28:30.856] cons_res: select_p_node_init > > [2017-11-29T16:28:30.856] cons_res: preparing for 1 partitions > > [2017-11-29T16:28:30.856] Running as primary controller > > [2017-11-29T16:28:30.856] Registering slurmctld at port 6817 with slurmdbd. > > [2017-11-29T16:28:31.098] No parameter for mcs plugin, default values set > > [2017-11-29T16:28:31.098] mcs: MCSParameters = (null). ondemand set. > > [2017-11-29T16:29:31.169] > SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_depth=0,sched_max_job_start=0,sched_min_interval=2 > > > > David > > > > > > > > El El mié, 29 nov 2017 a las 15:59, Steffen Grunewald < > steffen.grunew...@aei.mpg.de> escribió: > > Hi David, > > On Wed, 2017-11-29 at 14:45:06 +, david vilanova wrote: > > Hello, > > I have installed latest 7.11 release and my node is shown as down. > > I hava a single physical server with 12 cores so not sure the conf below > is > > correct ?? can you help ?? > > > > In slurm.conf the node is configure as follows: > > > > NodeName=linuxcluster CPUs=1 RealMemory=991 Sockets=12 CoresPerSocket=1 > > ThreadsPerCore=1 Feature=local > > 12 Sockets? Certainly not... 12 Cores per socket, yes. > (IIRC CPUS shouldn't be specified if the detailed topology is given. > You may try CPUs=12 and drop the details.) > > > PartitionName=testq Nodes=inuxcluster Default=YES MaxTime=INFINITE > State=UP >^^ typo? > > Cheers, > Steffen > >
Re: [slurm-users] "command not found"
Thanks manuel, The shared folder between master and slave sounds like a good option. I’ll go and try that one, Thanks El El vie, 15 dic 2017 a las 12:36, Manuel Rodríguez Pascual < manuel.rodriguez.pasc...@gmail.com> escribió: > Hi David, > > The command to be executed must be present on the node where the script is > run. When you submit a job in Slurm only the script is copied to the slave > node, so the data and binaries must be there prior to the script execution. > > The are many alternatives to deal with this situation depending on the > system and software requirements. A common practice (as far as I know, I am > not an experienced sysadmin) is to install the software in a shared storage > and mount it both on the master node and in the slaves. This is analog to > what you are probably doing with the user $HOME. > > Of course this cannot always be done, and sometimes the only solution is > install a particular library or application in every node. In this case > there are many solutions to automatice the process. > > Cheers, > > Manuel > > 2017-12-15 12:21 GMT+01:00 david : > >> Hi, >> >> when running a sbatch script i get "command not found". >> >> >> The command is blast (quite used bioinformatics tool). >> >> >> The problem comes from the fact that the blast binary is installed in the >> master node but not on the other nodes. When the job runs on another node >> the binary is not found. >> >> >> What would be way to deal with this situation ? what is common practice ? >> >> >> thanks, >> >> david >> >> >> >> >
Re: [slurm-users] Allocate more memory
Thanks for the quick response. Should the following script do the trick ?? meaning use all required nodes to have at least 3G total memory ? even though my nodes were setup with 2G each ?? #SBATCH array 1-10%10:1 #SBATCH mem-per-cpu=3000m srun R CMD BATCH myscript.R thanks On 07/02/2018 15:50, Loris Bennett wrote: Hi David, david martin writes: Hi, I would like to submit a job that requires 3Go. The problem is that I have 70 nodes available each node with 2Gb memory. So the command sbatch --mem=3G will wait for ressources to become available. Can I run sbatch and tell the cluster to use the 3Go out of the 70Go available or is that a particular setup ? meaning is the memory restricted to each node ? or should i allocate two nodes so that i have 2x4Go availble ? Check man sbatch You'll find that --mem means memory per node. Thus, if you specify 3GB but all the nodes have 2GB, your job will wait forever (or until you buy more RAM and reconfigure Slurm). You probably want --mem-per-cpu, which is actually more like memory per task. This is obviously only going to work if your job can actually run on more than one node, e.g. is MPI enabled. Cheers, Loris
Re: [slurm-users] Allocate more memory
Yes, when working with the human genome you can easily go up to 16Gb. El El mié, 7 feb 2018 a las 16:20, Krieger, Donald N. escribió: > Sorry for jumping in without full knowledge of the thread. > But it sounds like the key issue is that each job requires 3 GBytes. > Even if that's true, won't jobs start on cores with less memory and then > just page? > Of course as the previous post states, you must tailor your slurm request > to the physical limits of your cluster. > > But the real question is do the jobs really require 3 GBytes of resident > memory. > Most code declares far more than required and then ends up running in what > it actually uses. > You can tell by running a job and viewing the memory statistics with top > or something similar. > > Anyway - best - Don > > -Original Message- > From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On > Behalf Of r...@open-mpi.org > Sent: Wednesday, February 7, 2018 10:03 AM > To: Slurm User Community List > Subject: Re: [slurm-users] Allocate more memory > > Afraid not - since you don’t have any nodes that meet the 3G requirement, > you’ll just hang. > > > On Feb 7, 2018, at 7:01 AM, david vilanova wrote: > > > > Thanks for the quick response. > > > > Should the following script do the trick ?? meaning use all required > nodes to have at least 3G total memory ? even though my nodes were setup > with 2G each ?? > > > > #SBATCH array 1-10%10:1 > > > > #SBATCH mem-per-cpu=3000m > > > > srun R CMD BATCH myscript.R > > > > > > > > thanks > > > > > > > > > > On 07/02/2018 15:50, Loris Bennett wrote: > >> Hi David, > >> > >> david martin writes: > >> > >>> > >>> > >>> Hi, > >>> > >>> I would like to submit a job that requires 3Go. The problem is that I > have 70 nodes available each node with 2Gb memory. > >>> > >>> So the command sbatch --mem=3G will wait for ressources to become > available. > >>> > >>> Can I run sbatch and tell the cluster to use the 3Go out of the 70Go > >>> available or is that a particular setup ? meaning is the memory > >>> restricted to each node ? or should i allocate two nodes so that i > >>> have 2x4Go availble ? > >> Check > >> > >> man sbatch > >> > >> You'll find that --mem means memory per node. Thus, if you specify > >> 3GB but all the nodes have 2GB, your job will wait forever (or until > >> you buy more RAM and reconfigure Slurm). > >> > >> You probably want --mem-per-cpu, which is actually more like memory > >> per task. This is obviously only going to work if your job can > >> actually run on more than one node, e.g. is MPI enabled. > >> > >> Cheers, > >> > >> Loris > >> > > > > > > >
Re: [slurm-users] Allocate more memory
Thanks all for your comments, i will look into that El El mié, 7 feb 2018 a las 16:37, Loris Bennett escribió: > > I was make the unwarranted assumption that you have multiple processes. > So if you have a single process which needs more than 2GB, Ralph is of > course right and there is nothing you can do. > > However, you are using R, so, depending on your problem, you may be able > to make use of a package like Rmpi to allow your job to run on multiple > nodes. > > Cheers, > > Loris > > "r...@open-mpi.org" writes: > > > Afraid not - since you don’t have any nodes that meet the 3G > requirement, you’ll just hang. > > > >> On Feb 7, 2018, at 7:01 AM, david vilanova wrote: > >> > >> Thanks for the quick response. > >> > >> Should the following script do the trick ?? meaning use all required > nodes to have at least 3G total memory ? even though my nodes were setup > with 2G each ?? > >> > >> #SBATCH array 1-10%10:1 > >> > >> #SBATCH mem-per-cpu=3000m > >> > >> srun R CMD BATCH myscript.R > >> > >> > >> > >> thanks > >> > >> > >> > >> > >> On 07/02/2018 15:50, Loris Bennett wrote: > >>> Hi David, > >>> > >>> david martin writes: > >>> > >>>> > >>>> > >>>> Hi, > >>>> > >>>> I would like to submit a job that requires 3Go. The problem is that I > have 70 nodes available each node with 2Gb memory. > >>>> > >>>> So the command sbatch --mem=3G will wait for ressources to become > available. > >>>> > >>>> Can I run sbatch and tell the cluster to use the 3Go out of the 70Go > >>>> available or is that a particular setup ? meaning is the memory > >>>> restricted to each node ? or should i allocate two nodes so that i > >>>> have 2x4Go availble ? > >>> Check > >>> > >>> man sbatch > >>> > >>> You'll find that --mem means memory per node. Thus, if you specify 3GB > >>> but all the nodes have 2GB, your job will wait forever (or until you > buy > >>> more RAM and reconfigure Slurm). > >>> > >>> You probably want --mem-per-cpu, which is actually more like memory per > >>> task. This is obviously only going to work if your job can actually > run > >>> on more than one node, e.g. is MPI enabled. > >>> > >>> Cheers, > >>> > >>> Loris > >>> > >> > >> > -- > Dr. Loris Bennett (Mr.) > ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de > >