Hi Sean,

 

Thank you for your prompt response,  I made the changes you suggested, 
slurmctld refuse running……. find attached new slurmctld -Dvvvv

 

jb

 

 

 

From: slurm-users <slurm-users-boun...@lists.schedmd.com> On Behalf Of Sean 
Crosby
Sent: Monday, April 5, 2021 11:46 AM
To: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: Re: [slurm-users] [EXT] slurmctld error

 

Hi Jb,

 

You have set AccountingStoragePort to 3306 in slurm.conf, which is the MySQL 
port running on the DBD host.

 

AccountingStoragePort is the port for the Slurmdbd service, and not for MySQL.

 

Change AccountingStoragePort to 6819 and it should fix your issues.

 

I also think you should comment out the lines 

 

AccountingStorageUser=slurm
AccountingStoragePass=/run/munge/munge.socket.2

 

You shouldn't need those lines

 

Sean

 

--
Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
Research Computing Services | Business Services
The University of Melbourne, Victoria 3010 Australia

 

 

On Mon, 5 Apr 2021 at 18:03, Ioannis Botsis <ibot...@isc.tuc.gr 
<mailto:ibot...@isc.tuc.gr> > wrote:


UoM notice: External email. Be cautious of links, attachments, or impersonation 
attempts

 

  _____  

Hello everyone,

 

I installed the slurm 19.05.5 from Ubuntu repo,  for the first time in a 
cluster with 44  identical nodes but I have problem with slurmctld.service

 

When I try to activate slurmctd I get the following message…

 

fatal: You are running with a database but for some reason we have no TRES from 
it.  This should only happen if the database is down and you don't have any 
state files

 

*       Ubuntu 20.04.2 runs on the server and nodes in the exact same version.
*       munge 0.5.13 installed from Ubuntu repo running on server and nodes.
*       mysql  Ver 8.0.23-0ubuntu0.20.04.1 for Linux on x86_64 ((Ubuntu))  
installed from ubuntu repo running on server.

 

slurm.conf is the same on all nodes and on server.

 

slurmd.service is active and running on all nodes without problem.

 

mysql.service is active and running on server.

slurmdbd.service is active and running on server (slurm_acct_db created).

 

Find attached slurm.conf slurmdbd.com <http://slurmdbd.com>   and detailed 
output of slurmctld -Dvvvv  command.

 

Any hint?

 

Thanks in advance

 

jb

 

 

 

slurmctld: debug:  Log file re-opened
slurmctld: slurmctld version 19.05.5 started on cluster tuc
slurmctld: debug3: Trying to load plugin 
/usr/lib/x86_64-linux-gnu/slurm-wlm/cred_munge.so
slurmctld: Munge credential signature plugin loaded
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin 
/usr/lib/x86_64-linux-gnu/slurm-wlm/auth_munge.so
slurmctld: debug:  Munge authentication plugin loaded
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin 
/usr/lib/x86_64-linux-gnu/slurm-wlm/select_cons_tres.so
slurmctld: select/cons_tres loaded with argument 4372
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin 
/usr/lib/x86_64-linux-gnu/slurm-wlm/select_linear.so
slurmctld: Linear node selection plugin loaded with argument 4372
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin 
/usr/lib/x86_64-linux-gnu/slurm-wlm/select_cray_aries.so
slurmctld: Cray/Aries node selection plugin loaded
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin 
/usr/lib/x86_64-linux-gnu/slurm-wlm/select_cons_res.so
slurmctld: Consumable Resources (CR) Node Selection plugin loaded with argument 
4372
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin 
/usr/lib/x86_64-linux-gnu/slurm-wlm/gres_gpu.so
slurmctld: debug:  init: Gres GPU plugin loaded
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin 
/usr/lib/x86_64-linux-gnu/slurm-wlm/preempt_none.so
slurmctld: preempt/none loaded
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin 
/usr/lib/x86_64-linux-gnu/slurm-wlm/checkpoint_none.so
slurmctld: debug3: Success.
slurmctld: debug:  Checkpoint plugin loaded: checkpoint/none
slurmctld: debug3: Trying to load plugin 
/usr/lib/x86_64-linux-gnu/slurm-wlm/acct_gather_energy_none.so
slurmctld: debug:  AcctGatherEnergy NONE plugin loaded
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin 
/usr/lib/x86_64-linux-gnu/slurm-wlm/acct_gather_profile_none.so
slurmctld: debug:  AcctGatherProfile NONE plugin loaded
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin 
/usr/lib/x86_64-linux-gnu/slurm-wlm/acct_gather_interconnect_none.so
slurmctld: debug:  AcctGatherInterconnect NONE plugin loaded
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin 
/usr/lib/x86_64-linux-gnu/slurm-wlm/acct_gather_filesystem_none.so
slurmctld: debug:  AcctGatherFilesystem NONE plugin loaded
slurmctld: debug3: Success.
slurmctld: debug2: No acct_gather.conf file (/etc/slurm-llnl/acct_gather.conf)
slurmctld: debug3: Trying to load plugin 
/usr/lib/x86_64-linux-gnu/slurm-wlm/jobacct_gather_cgroup.so
slurmctld: debug:  Job accounting gather cgroup plugin loaded
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin 
/usr/lib/x86_64-linux-gnu/slurm-wlm/ext_sensors_none.so
slurmctld: ExtSensors NONE plugin loaded
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin 
/usr/lib/x86_64-linux-gnu/slurm-wlm/switch_none.so
slurmctld: debug:  switch NONE plugin loaded
slurmctld: debug3: Success.
slurmctld: debug:  power_save module disabled, SuspendTime < 0
slurmctld: debug3: Trying to load plugin 
/usr/lib/x86_64-linux-gnu/slurm-wlm/accounting_storage_slurmdbd.so
slurmctld: Accounting storage SLURMDBD plugin loaded
slurmctld: debug3: Success.
slurmctld: debug2: slurm_connect failed: Connection refused
slurmctld: debug2: Error connecting slurm stream socket at 10.0.0.100:6819: 
Connection refused
slurmctld: error: slurm_persist_conn_open_without_init: failed to open 
persistent connection to se01:6819: Connection refused
slurmctld: error: slurmdbd: Sending PersistInit msg: Connection refused
slurmctld: debug:  Association database appears down, reading from state file.
slurmctld: debug:  create_mmap_buf: Failed to mmap file 
`/var/spool/slurm/ctld/last_tres`, No such device
slurmctld: debug2: No last_tres file (/var/spool/slurm/ctld/last_tres) to 
recover
slurmctld: debug:  create_mmap_buf: Failed to mmap file 
`/var/spool/slurm/ctld/assoc_mgr_state`, No such device
slurmctld: debug2: No association state file 
(/var/spool/slurm/ctld/assoc_mgr_state) to recover
slurmctld: fatal: You are running with a database but for some reason we have 
no TRES from it.  This should only happen if the database is down and you don't 
have any state files.

Reply via email to