Step back from slurm and confirm that MariaDb is up and responsive.
# mysql -uroot -pEnter password: Welcome to the MariaDB monitor.  Commands end 
with ; or \g.Your MariaDB connection id is 8Server version: 10.2.9-MariaDB 
MariaDB Server
Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> select table_schema, table_name from 
information_schema.tables;
 

    On Wednesday, November 29, 2017 10:17 AM, Bruno Santos 
<bacmsan...@gmail.com> wrote:
 

 Hi Barbara,
This is a fresh install. I have installed slurm from source on Debian stretch 
and now trying to set it up correctly. MariaDB is running for but I am confused 
about the database configuration. I followed a tutorial (I can no longer find 
it) that showed me how to create the database and give it to the slurm user on 
mysql. Haven't really done anything further than that as running anything 
return the same errors:

root@plantae:~# sacctmgr show user -s
sacctmgr: error: slurm_persist_conn_open: Something happened with the 
receiving/processing of the persistent connection init message to 
localhost:6819: Initial RPC not DBD_INIT
sacctmgr: error: slurmdbd: Sending PersistInit msg: No error
sacctmgr: error: slurm_persist_conn_open: Something happened with the 
receiving/processing of the persistent connection init message to 
localhost:6819: Initial RPC not DBD_INIT
sacctmgr: error: slurmdbd: Sending PersistInit msg: No error
sacctmgr: error: slurm_persist_conn_open: Something happened with the 
receiving/processing of the persistent connection init message to 
localhost:6819: Initial RPC not DBD_INIT
sacctmgr: error: slurmdbd: Sending PersistInit msg: No error
sacctmgr: error: slurmdbd: DBD_GET_USERS failure: No error
 Problem with query.
 

On 29 November 2017 at 14:46, Barbara Krašovec <barbara.kraso...@ijs.si> wrote:

Did you upgrade SLURM or is it a fresh install?
Are there any associations set? For instance, did you create the cluster with 
sacctmgr?sacctmgr add cluster <name>
Is mariadb/mysql server running, is slurmdbd running? Is it working? Try a 
simple test, such as:sacctmgr show user -sIf it was an upgrade, did you try to 
run the slurmdbd and slurmctld manuallly first:
slurmdbd -Dvvvvv
Then controller:
slurmctld -Dvvvvv
Which OS is that?Is there a firewall/selinux/ACLs?
Cheers,Barbara


On 29 Nov 2017, at 15:19, Bruno Santos <bacmsan...@gmail.com> wrote:
Thank you Barbara, 
Unfortunately, it does not seem to be a munge problem. Munge can successfully 
authenticate with the nodes. 
I have increased the verbosity level and restarted the slurmctld and now I am 
getting more information about this:

Nov 29 14:08:16 plantae slurmctld[30340]: Registering slurmctld at port 6817 
with slurmdbd.

Nov 29 14:08:16 plantae slurmctld[30340]: error: slurm_persist_conn_open: 
Something happened with the receiving/processing of the persistent connection 
init message to localhost:6819: Initial RPC not DBD_INIT

Nov 29 14:08:16 plantae slurmctld[30340]: error: slurmdbd: Sending PersistInit 
msg: No error

Nov 29 14:08:16 plantae slurmctld[30340]: error: slurm_persist_conn_open: 
Something happened with the receiving/processing of the persistent connection 
init message to localhost:6819: Initial RPC not DBD_INIT

Nov 29 14:08:16 plantae slurmctld[30340]: error: slurmdbd: Sending PersistInit 
msg: No error

Nov 29 14:08:16 plantae slurmctld[30340]: fatal: It appears you don't have any 
association data from your database.  The priority/multifactor plugin requires 
this information to run correctly.  Please check your database connection and 
try again.


The problem seems to somehow be related to slurmdbd?  I am a bit lost at this 
point, to be honest. 
Best,Bruno
On 29 November 2017 at 14:06, Barbara Krašovec <barbara.kraso...@ijs.si> wrote:

Hello,
does munge work?Try if decode works locally:munge -n | unmungeTry if decode 
works remotely:munge -n | ssh <somehost_in_cluster> unmunge
It seems as munge keys do not match...
See comments inline..


On 29 Nov 2017, at 14:40, Bruno Santos <bacmsan...@gmail.com> wrote:
I actually just managed to figure that one out. 
The problem was that I had setup AccountingStoragePass=magic in the slurm.conf 
file while after re-reading the documentation it seems this is only needed if I 
have a different munge instance controlling the logins to the database, which I 
don't. So commenting that line out seems to have worked however I am now 
getting a different error: 
Nov 29 13:19:20 plantae slurmctld[29984]: Registering slurmctld at port 6817 
with slurmdbd.
Nov 29 13:19:20 plantae slurmctld[29984]: error: slurm_persist_conn_open: 
Something happened with the receiving/processing of the persistent connection 
init message to localhost:6819: Initial RPC not DBD_INIT
Nov 29 13:19:20 plantae systemd[1]: slurmctld.service: Main process exited, 
code=exited, status=1/FAILURE
Nov 29 13:19:20 plantae systemd[1]: slurmctld.service: Unit entered failed 
state.
Nov 29 13:19:20 plantae systemd[1]: slurmctld.service: Failed with result 
'exit-code'.

My slurm.conf looks like this
# LOGGING AND ACCOUNTING
AccountingStorageHost=localhos t
AccountingStorageLoc=slurm_db
#AccountingStoragePass=magic
#AccountingStoragePort=
AccountingStorageType=accounti ng_storage/slurmdbd
AccountingStorageUser=slurm
AccountingStoreJobComment=YES
ClusterName=research
JobCompType=jobcomp/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gath er/none
SlurmctldDebug=3
SlurmdDebug=3


You only need:AccountingStorageEnforce=assoc 
iations,limits,qosAccountingStorageHost=<hostnam 
e>AccountingStorageType=accounti ng_storage/slurmdbd
You can remove AccountingStorageLoc and AccountingStorageUser.



And the slurdbd.conf like this:
ArchiveEvents=yes
ArchiveJobs=yes
ArchiveResvs=yes
ArchiveSteps=no
#ArchiveTXN=no
#ArchiveUsage=no
# Authentication info
AuthType=auth/munge
AuthInfo=/var/run/munge/munge. socket.2

#Database info
# slurmDBD info
DbdAddr=plantae
DbdHost=plantae
# Database info
StorageType=accounting_storage /mysql
StorageHost=localhost
SlurmUser=slurm
StoragePass=magic
StorageUser=slurm
StorageLoc=slurm_db


Thank you very much in advance. 
Best,Bruno 

Cheers,Barbara



On 29 November 2017 at 13:28, Andy Riebs <andy.ri...@hpe.com> wrote:

  It looks like you don't have the munged daemon running.
 
 On 11/29/2017 08:01 AM, Bruno Santos wrote:
  
 Hi everyone, 
  I have set-up slurm to use slurm_db and all was working fine. However I had 
to change the slurm.conf to play with user priority and upon restarting the 
slurmctl is fails with the following messages below. It seems that somehow is 
trying to use the mysql password as a munge socket?  Any idea how to solve it?  
   
Nov 29 12:56:30 plantae slurmctld[29613]: Registering slurmctld at port 6817 
with slurmdbd.
 Nov 29 12:56:32 plantae slurmctld[29613]: error: If munged is up, restart with 
--num-threads=10
 Nov 29 12:56:32 plantae slurmctld[29613]: error: Munge encode failed: Failed 
to access "magic": No such file or directory
 Nov 29 12:56:32 plantae slurmctld[29613]: error: authentication: Socket 
communication error
 Nov 29 12:56:32 plantae slurmctld[29613]: error: slurm_persist_conn_open: 
failed to send persistent connection init message to localhost:6819
 Nov 29 12:56:32 plantae slurmctld[29613]: error: slurmdbd: Sending PersistInit 
msg: Protocol authentication error
 Nov 29 12:56:34 plantae slurmctld[29613]: error: If munged is up, restart with 
--num-threads=10
 Nov 29 12:56:34 plantae slurmctld[29613]: error: Munge encode failed: Failed 
to access "magic": No such file or directory
 Nov 29 12:56:34 plantae slurmctld[29613]: error: authentication: Socket 
communication error
 Nov 29 12:56:34 plantae slurmctld[29613]: error: slurm_persist_conn_open: 
failed to send persistent connection init message to localhost:6819
 Nov 29 12:56:34 plantae slurmctld[29613]: error: slurmdbd: Sending PersistInit 
msg: Protocol authentication error
 Nov 29 12:56:36 plantae slurmctld[29613]: error: If munged is up, restart with 
--num-threads=10
 Nov 29 12:56:36 plantae slurmctld[29613]: error: Munge encode failed: Failed 
to access "magic": No such file or directory
 Nov 29 12:56:36 plantae slurmctld[29613]: error: authentication: Socket 
communication error
 Nov 29 12:56:36 plantae slurmctld[29613]: error: slurm_persist_conn_open: 
failed to send persistent connection init message to localhost:6819
 Nov 29 12:56:36 plantae slurmctld[29613]: error: slurmdbd: Sending PersistInit 
msg: Protocol authentication error
 Nov 29 12:56:36 plantae slurmctld[29613]: fatal: It appears you don't have any 
association data from your database.  The priority/multifactor plugin requires 
this information to run correctly.  Please check your database connection and 
try again.
 Nov 29 12:56:36 plantae systemd[1]: slurmctld.service: Main process exited, 
code=exited, status=1/FAILURE
 Nov 29 12:56:36 plantae systemd[1]: slurmctld.service: Unit entered failed 
state.
 Nov 29 12:56:36 plantae systemd[1]: slurmctld.service: Failed with result 
'exit-code'.
 
      
 
 











   

Reply via email to