Are you trying to start the slurmd in the headnode or a compute node? Can you provide the slurm.conf file?
Regards, Carlos On Mon, Jan 15, 2018 at 4:30 PM, Elisabetta Falivene < e.faliv...@ilabroma.com> wrote: > slurmd -Dvvv says > > slurmd: fatal: Unable to determine this slurmd's NodeName > > b > > 2018-01-15 15:58 GMT+01:00 Douglas Jacobsen <dmjacob...@lbl.gov>: > >> The fact that sinfo is responding shows that at least slurmctld is >> running. Slumd, on the other hand is not. Please also get output of >> slurmd log or running "slurmd -Dvvv" >> > > > > >> >> On Jan 15, 2018 06:42, "Elisabetta Falivene" <e.faliv...@ilabroma.com> >> wrote: >> >>> > Anyway I suggest to update the operating system to stretch and fix your >>> > configuration under a more recent version of slurm. >>> >>> I think I'll soon arrive to that :) >>> b >>> >>> 2018-01-15 14:08 GMT+01:00 Gennaro Oliva <oliv...@na.icar.cnr.it>: >>> >>>> Ciao Elisabetta, >>>> >>>> On Mon, Jan 15, 2018 at 01:13:27PM +0100, Elisabetta Falivene wrote: >>>> > Error messages are not much helping me in guessing what is going on. >>>> What >>>> > should I check to get what is failing? >>>> >>>> check slurmctld.log and slurmd.log, you can find them under >>>> /var/log/slurm-llnl >>>> >>>> > *PARTITION AVAIL TIMELIMIT NODES STATE NODELIST* >>>> > *batch* up infinite 8 unk* node[01-08]* >>>> > >>>> > >>>> > Running >>>> > *systemctl status slurmctld.service* >>>> > >>>> > returns >>>> > >>>> > *slurmctld.service - Slurm controller daemon* >>>> > * Loaded: loaded (/lib/systemd/system/slurmctld.service; enabled)* >>>> > * Active: failed (Result: timeout) since Mon 2018-01-15 13:03:39 >>>> CET; 41s >>>> > ago* >>>> > * Process: 2098 ExecStart=/usr/sbin/slurmctld $SLURMCTLD_OPTIONS >>>> > (code=exited, status=0/SUCCESS)* >>>> > >>>> > * slurmctld[2100]: cons_res: select_p_reconfigure* >>>> > * slurmctld[2100]: cons_res: select_p_node_init* >>>> > * slurmctld[2100]: cons_res: preparing for 1 partitions* >>>> > * slurmctld[2100]: Running as primary controller* >>>> > * slurmctld[2100]: >>>> > SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,ma >>>> x_sched_time=4,partition_job_depth=0* >>>> > * slurmctld.service start operation timed out. Terminating.* >>>> > *Terminate signal (SIGINT or SIGTERM) received* >>>> > * slurmctld[2100]: Saving all slurm state* >>>> > * Failed to start Slurm controller daemon.* >>>> > * Unit slurmctld.service entered failed state.* >>>> >>>> Do you have a backup controller? >>>> Check your slurm.conf under: >>>> /etc/slurm-llnl >>>> >>>> Anyway I suggest to update the operating system to stretch and fix your >>>> configuration under a more recent version of slurm. >>>> Best regards >>>> -- >>>> Gennaro Oliva >>>> >>>> >>> > -- -- Carles Fenoy