Re: [slurm-users] Can't start slurmdbd

2017-11-21 Thread Juan A. Cordero Varelaq
I guess mariadb-devel was not installed by the time another person 
installed slurm. I have a bunch of slurm-* rpms I installed using "yum 
localinstall ...". Should I installed them in another way or remove slurm?


The file accounting_storage_mysql.so is bythe way absent on the machine.

Thanks
On 20/11/17 21:52, Lachlan Musicman wrote:
Also - make sure you have MariaDB-devel when you make the RPMs - 
that's the first bit.
The second bit is you might have to find the 
accounting_storage_mysql.so and place it in /usr/lib64/slurm.


I think it might end up in 
/path/to/rpmbuild/BUILD/sec/plugins/accounting/.libs/ or something 
like that


Cheers
L.

--
"The antidote to apocalypticism is *apocalyptic civics*. Apocalyptic 
civics is the insistence that we cannot ignore the truth, nor should 
we panic about it. It is a shared consciousness that our institutions 
have failed and our ecosystem is collapsing, yet we are still here — 
and we are creative agents who can shape our destinies. Apocalyptic 
civics is the conviction that the only way out is through, and the 
only way through is together. "


/Greg Bloom/ @greggish 
https://twitter.com/greggish/status/873177525903609857


On 21 November 2017 at 06:35, Philip Kovacs > wrote:


Try adding this to your conf:

PluginDir=/usr/lib64/slurm


On Monday, November 20, 2017 6:48 AM, Juan A. Cordero Varelaq
mailto:bioinformatica-i...@us.es>> wrote:


I did that but got the same errors.
slurmdbd.log contains by the way the following:

[2017-11-20T12:39:04.178] error: Couldn't find the specified
plugin name for accounting_storage/mysql looking at all files
[2017-11-20T12:39:04.179] error: cannot find
accounting_storage plugin for accounting_storage/mysql
[2017-11-20T12:39:04.179] error: cannot create
accounting_storage context for accounting_storage/mysql
[2017-11-20T12:39:04.179] fatal: Unable to initialize
accounting_storage/mysql accounting storage plugin

It seems it lacks the accounting_storage_mysql.so:

$ ls /usr/lib64/slurm/accounting_storage_*
/usr/lib64/slurm/accounting_storage_filetxt.so
/usr/lib64/slurm/accounting_storage_none.so
/usr/lib64/slurm/accounting_storage_slurmdbd.so

However, I did install the slurm-sql rpm package.
Any idea about what's failing?

Thanks
On 20/11/17 12:11, Lachlan Musicman wrote:

On 20 November 2017 at 20:50, Juan A. Cordero Varelaq
mailto:bioinformatica-i...@us.es>> wrote:

$ systemctl start slurmdbd
Job for slurmdbd.service failed because the control
process exited with error code. See "systemctl status
slurmdbd.service" and "journalctl -xe" for details.
$ systemctl status slurmdbd.service
● slurmdbd.service - Slurm DBD accounting daemon
   Loaded: loaded (/etc/systemd/system/slurmdbd. service;
enabled; vendor preset: disabled)

   Active: failed (Result: exit-code) since lun
2017-11-20 10:39:26 CET; 53s ago
  Process: 27592 ExecStart=/usr/sbin/slurmdbd
$SLURMDBD_OPTIONS (code=exited, status=1/FAILURE)

nov 20 10:39:26 login_node systemd[1]: Starting Slurm DBD
accounting daemon...
nov 20 10:39:26 login_node systemd[1]: slurmdbd.service:
control process exited, code=exited status=1
nov 20 10:39:26 login_node systemd[1]: Failed to start
Slurm DBD accounting daemon.
nov 20 10:39:26 login_node systemd[1]: Unit
slurmdbd.service entered failed state.
nov 20 10:39:26 login_node systemd[1]: slurmdbd.service
failed.
$ journalctl -xe
nov 20 10:39:26 login_node polkitd[1078]: Registered
Authentication Agent for unix-process:27586:119889015 (system
bus name :1.871 [/usr/bin/pkttyagent --notify-fd 5
--fallback], object path /or
nov 20 10:39:26 login_node systemd[1]: Starting Slurm DBD
accounting daemon...
-- Subject: Unit slurmdbd.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/m
ailman/listinfo/systemd-devel

--
-- Unit slurmdbd.service has begun starting up.
nov 20 10:39:26 login_node systemd[1]: slurmdbd.service:
control process exited, code=exited status=1
nov 20 10:39:26 login_node systemd[1]: Failed to start
Slurm DBD accounting daemon.
-- Subject: Unit slurmdbd.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/m
ailman/listinfo/systemd-devel

--
-- Unit slurmdbd.service has failed.
 

Re: [slurm-users] Can't start slurmdbd

2017-11-21 Thread Ole Holm Nielsen

On 11/20/2017 10:50 AM, Juan A. Cordero Varelaq wrote:
Slurm 17.02.3 was installed on my cluster some time ago but recently I 
decided to use SlurmDBD for the accounting.


After installing several packages (slurm-devel, slurm-munge, 
slurm-perlapi, slurm-plugins, slurm-slurmdbd and slurm-sql) and MariaDB 
in CentOS 7, I created an SQL database:

...

You may want to consult my Wiki for installing Slurm on CentOS/RHEL 7. 
Everything for getting started with Slurm is explained in 
https://wiki.fysik.dtu.dk/niflheim/SLURM


In particular, the database setup is described in 
https://wiki.fysik.dtu.dk/niflheim/Slurm_database


/Ole



Re: [slurm-users] Query about Compute + GPUs

2017-11-21 Thread Markus Köberl
On Friday, 3 November 2017 10:12:32 CET Merlin Hartley wrote:
> They would need to have different NodeNames - but the same NodeAddr for
> example:
> 
> NodeName=fisesta-21-3 NodeAddr=10.1.21.3 CPUs=6 Weight=20485797
> Feature=rack-21,6CPUs NodeName=fisesta-21-3-gpu NodeAddr=10.1.21.3 CPUs=2
> Weight=20485797 Feature=rack-21,2CPUs Gres=gpu:1
> 
> Hope this is useful!

For me this is not working.

I have the following lines in slurm.conf:

NodeName=gpu1 NodeAddr=10.1.2.3 RealMemory=229376 Weight=998002  Sockets=2 
CoresPerSocket=3 ThreadsPerCore=2 Gres=gpu:TeslaK40c:6

NodeName=gpu1-cpu NodeAddr=10.1.2.3 RealMemory=229376 Weight=998002  Sockets=2 
CoresPerSocket=11 ThreadsPerCore=2

PartitionName=gpu Nodes=gpu1
PartitionName=cpu Nodes=gpu1-cpu

But if i submit to node gpu1-cpu I get the following error:

[2017-11-21T09:06:55.840] launch task 999708.0 request from 1044.1000@10.1.2.3 
(port 45252)
[2017-11-21T09:06:55.840] error: Invalid job 999708.0 credential for user 
1044: host gpu1 not in hostset gpu1-cpu
[2017-11-21T09:06:55.840] error: Invalid job credential from 1044@10.1.2.3: 
Invalid job credential

It seams I am missing something. Any ideas what that could be?
I am using slurm 16.05.9 on debian stretch.


regards
Markus Köberl
-- 
Markus Koeberl
Graz University of Technology
Signal Processing and Speech Communication Laboratory
E-mail: markus.koeb...@tugraz.at



Re: [slurm-users] Query about Compute + GPUs

2017-11-21 Thread Merlin Hartley
Could you give us your submission command?
It may be that you are requesting the wrong partition - i.e. relying on the 
default partition selection… 
try with “--partition cpu”


M



--
Merlin Hartley
Computer Officer
MRC Mitochondrial Biology Unit
Cambridge, CB2 0XY
United Kingdom

> On 21 Nov 2017, at 09:52, Markus Köberl  > wrote:
> 
> On Friday, 3 November 2017 10:12:32 CET Merlin Hartley wrote:
>> They would need to have different NodeNames - but the same NodeAddr for
>> example:
>> 
>> NodeName=fisesta-21-3 NodeAddr=10.1.21.3 CPUs=6 Weight=20485797
>> Feature=rack-21,6CPUs NodeName=fisesta-21-3-gpu NodeAddr=10.1.21.3 CPUs=2
>> Weight=20485797 Feature=rack-21,2CPUs Gres=gpu:1
>> 
>> Hope this is useful!
> 
> For me this is not working.
> 
> I have the following lines in slurm.conf:
> 
> NodeName=gpu1 NodeAddr=10.1.2.3 RealMemory=229376 Weight=998002  Sockets=2 
> CoresPerSocket=3 ThreadsPerCore=2 Gres=gpu:TeslaK40c:6
> 
> NodeName=gpu1-cpu NodeAddr=10.1.2.3 RealMemory=229376 Weight=998002  
> Sockets=2 
> CoresPerSocket=11 ThreadsPerCore=2
> 
> PartitionName=gpu Nodes=gpu1
> PartitionName=cpu Nodes=gpu1-cpu
> 
> But if i submit to node gpu1-cpu I get the following error:
> 
> [2017-11-21T09:06:55.840] launch task 999708.0 request from 
> 1044.1000@10.1.2.3  
> (port 45252)
> [2017-11-21T09:06:55.840] error: Invalid job 999708.0 credential for user 
> 1044: host gpu1 not in hostset gpu1-cpu
> [2017-11-21T09:06:55.840] error: Invalid job credential from 1044@10.1.2.3 
> : 
> Invalid job credential
> 
> It seams I am missing something. Any ideas what that could be?
> I am using slurm 16.05.9 on debian stretch.
> 
> 
> regards
> Markus Köberl
> -- 
> Markus Koeberl
> Graz University of Technology
> Signal Processing and Speech Communication Laboratory
> E-mail: markus.koeb...@tugraz.at 
> 



--
Merlin Hartley
Computer Officer
MRC Mitochondrial Biology Unit
Cambridge, CB2 0XY
United Kingdom



Re: [slurm-users] Query about Compute + GPUs

2017-11-21 Thread Markus Köberl
On Tuesday, 21 November 2017 10:26:53 CET Merlin Hartley wrote:
> Could you give us your submission command?
> It may be that you are requesting the wrong partition - i.e. relying on the
> default partition selection… try with “--partition cpu”

I run the following commands:

srun --gres=gpu --mem-per-cpu="5G" -w gpu1 --pty /bin/bash
-> works, partition gpu

srun --mem-per-cpu="5G" -p cpu --pty /bin/bash
-> works, I get a slot on another node which has only one NodeName entry.

srun --mem-per-cpu="5G" -p cpu -w gpu1-cpu --pty /bin/bash
-> error: Invalid job credential...

srun --mem-per-cpu="5G" -p cpu -w gpu1 --pty /bin/bash
-> error not in partition...


I am using the following options:

EnforcePartLimits=ANY
GresTypes=gpu
JobSubmitPlugins=all_partitions
ProctrackType=proctrack/cgroup
ReturnToService=2
TaskPlugin=task/cgroup
TrackWCKey=yes
InactiveLimit=3600
KillWait=1800
MinJobAge=600
OverTimeLimit=600
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
DefMemPerCPU=1000
FastSchedule=1
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
PriorityFlags=ACCRUE_ALWAYS,FAIR_TREE,SMALL_RELATIVE_TO_TIME
PriorityType=priority/multifactor
PriorityDecayHalfLife=7-0
PriorityFavorSmall=YES
PriorityWeightAge=50
PriorityWeightFairshare=25
PriorityWeightJobSize=50
PriorityWeightPartition=100
PriorityWeightTRES=CPU=1000,Mem=2000,Gres/gpu=3000
AccountingStorageEnforce=associations,limits,qos,WCKey
AccountingStorageType=accounting_storage/slurmdbd
AccountingStoreJobComment=YES
AccountingStorageTRES=CPU,Mem,Gres/gpu
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/cgroup


regards
Markus Köberl
-- 
Markus Koeberl
Graz University of Technology
Signal Processing and Speech Communication Laboratory
E-mail: markus.koeb...@tugraz.at



Re: [slurm-users] Query about Compute + GPUs

2017-11-21 Thread Ing. Gonzalo E. Arroyo
I have a problem detecting RAM and Arch (maybe some more), check this...

NodeName=fisesta-21-3 Arch=x86_64 CoresPerSocket=1
   CPUAlloc=0 CPUErr=0 CPUTot=2 CPULoad=0.01
   AvailableFeatures=rack-21,2CPUs
   ActiveFeatures=rack-21,2CPUs
   Gres=gpu:1
   NodeAddr=10.1.21.3 NodeHostName=fisesta-21-3 Version=16.05
   OS=Linux RealMemory=3950 AllocMem=0 FreeMem=0 Sockets=2 Boards=1
   State=IDLE ThreadsPerCore=1 TmpDisk=259967 Weight=20479797 Owner=N/A
MCS_label=N/A
   BootTime=2017-10-30T16:39:22 SlurmdStartTime=2017-11-06T16:46:54
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s


NodeName=fisesta-21-3-cpus CoresPerSocket=1
   CPUAlloc=0 CPUErr=0 CPUTot=6 CPULoad=0.01
   AvailableFeatures=rack-21,6CPUs
   ActiveFeatures=rack-21,6CPUs
   Gres=(null)
   NodeAddr=10.1.21.3 NodeHostName=fisesta-21-3-cpus Version=(null)
   RealMemory=1 AllocMem=0 FreeMem=0 Sockets=6 Boards=1
   State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=20483797 Owner=N/A
MCS_label=N/A
   BootTime=None SlurmdStartTime=None
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s


For your problem, please share the important lines of nodes and partitions,
you should check your users have permission to run inside very partition /
node splitted by this new configuration

*Este mensaje es confidencial. Puede contener información amparada por el
secreto comercial. Si usted ha recibido este e-mail por error, deberá
eliminarlo de su sistema. No deberá copiar el mensaje ni divulgar su
contenido a ninguna persona. Muchas gracias.*
This message is confidential. It may also contain information that is
privileged or not authorized to be disclosed. If you have received it by
mistake, delete it from your system. You should not copy the messsage nor
disclose its contents to anyone. Thanks.

El mar., 21 de nov. de 2017 a la(s) 11:05, Markus Köberl <
markus.koeb...@tugraz.at> escribió:

> On Tuesday, 21 November 2017 10:26:53 CET Merlin Hartley wrote:
> > Could you give us your submission command?
> > It may be that you are requesting the wrong partition - i.e. relying on
> the
> > default partition selection… try with “--partition cpu”
>
> I run the following commands:
>
> srun --gres=gpu --mem-per-cpu="5G" -w gpu1 --pty /bin/bash
> -> works, partition gpu
>
> srun --mem-per-cpu="5G" -p cpu --pty /bin/bash
> -> works, I get a slot on another node which has only one NodeName entry.
>
> srun --mem-per-cpu="5G" -p cpu -w gpu1-cpu --pty /bin/bash
> -> error: Invalid job credential...
>
> srun --mem-per-cpu="5G" -p cpu -w gpu1 --pty /bin/bash
> -> error not in partition...
>
>
> I am using the following options:
>
> EnforcePartLimits=ANY
> GresTypes=gpu
> JobSubmitPlugins=all_partitions
> ProctrackType=proctrack/cgroup
> ReturnToService=2
> TaskPlugin=task/cgroup
> TrackWCKey=yes
> InactiveLimit=3600
> KillWait=1800
> MinJobAge=600
> OverTimeLimit=600
> SlurmctldTimeout=120
> SlurmdTimeout=300
> Waittime=0
> DefMemPerCPU=1000
> FastSchedule=1
> SchedulerType=sched/backfill
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core_Memory
> PriorityFlags=ACCRUE_ALWAYS,FAIR_TREE,SMALL_RELATIVE_TO_TIME
> PriorityType=priority/multifactor
> PriorityDecayHalfLife=7-0
> PriorityFavorSmall=YES
> PriorityWeightAge=50
> PriorityWeightFairshare=25
> PriorityWeightJobSize=50
> PriorityWeightPartition=100
> PriorityWeightTRES=CPU=1000,Mem=2000,Gres/gpu=3000
> AccountingStorageEnforce=associations,limits,qos,WCKey
> AccountingStorageType=accounting_storage/slurmdbd
> AccountingStoreJobComment=YES
> AccountingStorageTRES=CPU,Mem,Gres/gpu
> JobAcctGatherFrequency=30
> JobAcctGatherType=jobacct_gather/cgroup
>
>
> regards
> Markus Köberl
> --
> Markus Koeberl
> Graz University of Technology
> Signal Processing and Speech Communication Laboratory
> E-mail: markus.koeb...@tugraz.at
>
> --
Ing. Gonzalo Arroyo