Ciao Elisabetta, On Mon, Jan 15, 2018 at 01:13:27PM +0100, Elisabetta Falivene wrote: > Error messages are not much helping me in guessing what is going on. What > should I check to get what is failing?
check slurmctld.log and slurmd.log, you can find them under /var/log/slurm-llnl > *PARTITION AVAIL TIMELIMIT NODES STATE NODELIST* > *batch* up infinite 8 unk* node[01-08]* > > > Running > *systemctl status slurmctld.service* > > returns > > *slurmctld.service - Slurm controller daemon* > * Loaded: loaded (/lib/systemd/system/slurmctld.service; enabled)* > * Active: failed (Result: timeout) since Mon 2018-01-15 13:03:39 CET; 41s > ago* > * Process: 2098 ExecStart=/usr/sbin/slurmctld $SLURMCTLD_OPTIONS > (code=exited, status=0/SUCCESS)* > > * slurmctld[2100]: cons_res: select_p_reconfigure* > * slurmctld[2100]: cons_res: select_p_node_init* > * slurmctld[2100]: cons_res: preparing for 1 partitions* > * slurmctld[2100]: Running as primary controller* > * slurmctld[2100]: > SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=4,partition_job_depth=0* > * slurmctld.service start operation timed out. Terminating.* > *Terminate signal (SIGINT or SIGTERM) received* > * slurmctld[2100]: Saving all slurm state* > * Failed to start Slurm controller daemon.* > * Unit slurmctld.service entered failed state.* Do you have a backup controller? Check your slurm.conf under: /etc/slurm-llnl Anyway I suggest to update the operating system to stretch and fix your configuration under a more recent version of slurm. Best regards -- Gennaro Oliva