Hi Johnsy, 1. Do you have an active support contract with SchedMD? AFAIK they only offer paid support. 2. The error message is pretty straight forward, slurmctld is not running. Did you start it (systemctl start slurmctld)? 3. slurmd needs to run on the node(s) you want to run on as well, and as I'm guessing you are using localhost for the controller and want to run jobs on localhost, so slurmctld and slurmd need to be running on localhost. 4. Is munge running? 5. May I ask why you're chown-ing pid and logfiles? The slurm user (typically "slurm") needs to have access to those files. Munge for instance checks for ownership and complains if something is not correct. 6. "srun /proc/cpuinfo" will fail, even if slurmctld and slurmd are running, because /proc/cpuinfo is not an executable file. You may want to insert "cat" after srun. Another simple test would be "srun hostname"
And, just my personal opinion, if this is your first experiment with Slurm, I wouldn't change too much right from the beginning but instead get it working first and then change things to your needs. Slurm is also available in the EPEL repos, so you could install it using dnf and experiment with the packaged version. Hope this helps, Florian ________________________________ From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Johnsy K. John <johnsyjo...@gmail.com> Sent: Monday, 19 April 2021 01:43 To: sa...@schedmd.com <sa...@schedmd.com>; johnsy john <johnsyjo...@gmail.com>; slurm-us...@schedmd.com <slurm-us...@schedmd.com> Subject: [External] [slurm-users] Slurm Configuration assistance: Unable to use srun after installation (slurm on fedora 33) Hello SchedMD team, I would like to use your slurm workload manager for learning purposes. And I tried installing the the software (downloaded from: https://www.schedmd.com/downloads.php ) and followed the steps as mentioned in: https://slurm.schedmd.com/download.html https://slurm.schedmd.com/quickstart_admin.html My Linux OS is fedora 33, and i tried installing it as root login. After installation and configuration as mentioned in page: https://slurm.schedmd.com/quickstart_admin.html I got some errors when I tried to do srun. Details about the installation and use are as follows: Using root permissions, copied to: /root/installations/ cd /root/installations/ tar --bzip -x -f slurm-20.11.5.tar.bz2 cd slurm-20.11.5/ ./configure --enable-debug --prefix=/usr/local --sysconfdir=/usr/local/etc make make install Following steps are based on https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration mkdir /var/spool/slurmctld /var/log/slurm chown johnsy /var/spool/slurmctld chown johnsy /var/log/slurm chmod 755 /var/spool/slurmctld /var/log/slurm cp /var/run/slurmctld.pid /var/run/slurmd.pid touch /var/log/slurm/slurmctld.log chown johnsy /var/log/slurm/slurmctld.log touch /var/log/slurm/slurm_jobacct.log /var/log/slurm/slurm_jobcomp.log chown johnsy /var/log/slurm/slurm_jobacct.log /var/log/slurm/slurm_jobcomp.log ldconfig -n /usr/lib64 Now when I tried an example command for trial: srun /proc/cpuinfo I get the following error: srun: error: Unable to allocate resources: Unable to contact slurm controller (connect failure) My configuration file slurm.conf f that i created is: ###################################################################################################### ###################################################################################################### # slurm.conf file generated by configurator.html. # Put this file on all nodes of your cluster. # See the slurm.conf man page for more information. # SlurmctldHost=homepc #SlurmctldHost= # #DisableRootJobs=NO #EnforcePartLimits=NO #Epilog= #EpilogSlurmctld= #FirstJobId=1 #MaxJobId=999999 #GresTypes= #GroupUpdateForce=0 #GroupUpdateTime=600 #JobFileAppend=0 #JobRequeue=1 #JobSubmitPlugins=1 #KillOnBadExit=0 #LaunchType=launch/slurm #Licenses=foo*4,bar #MailProg=/bin/mail #MaxJobCount=5000 #MaxStepCount=40000 #MaxTasksPerNode=128 MpiDefault=none #MpiParams=ports=#-# #PluginDir= #PlugStackConfig= #PrivateData=jobs ProctrackType=proctrack/cgroup #Prolog= #PrologFlags= #PrologSlurmctld= #PropagatePrioProcess=0 #PropagateResourceLimits= #PropagateResourceLimitsExcept= #RebootProgram= ReturnToService=1 SlurmctldPidFile=/var/run/slurmctld.pid SlurmctldPort=6817 SlurmdPidFile=/var/run/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=johnsy #SlurmdUser=root #SrunEpilog= #SrunProlog= StateSaveLocation=/var/spool SwitchType=switch/none #TaskEpilog= TaskPlugin=task/affinity #TaskProlog= #TopologyPlugin=topology/tree #TmpFS=/tmp #TrackWCKey=no #TreeWidth= #UnkillableStepProgram= #UsePAM=0 # # # TIMERS #BatchStartTimeout=10 #CompleteWait=0 #EpilogMsgTime=2000 #GetEnvTimeout=2 #HealthCheckInterval=0 #HealthCheckProgram= InactiveLimit=0 KillWait=30 #MessageTimeout=10 #ResvOverRun=0 MinJobAge=300 #OverTimeLimit=0 SlurmctldTimeout=120 SlurmdTimeout=300 #UnkillableStepTimeout=60 #VSizeFactor=0 Waittime=0 # # # SCHEDULING #DefMemPerCPU=0 #MaxMemPerCPU=0 #SchedulerTimeSlice=30 SchedulerType=sched/backfill SelectType=select/cons_tres SelectTypeParameters=CR_Core # # # JOB PRIORITY #PriorityFlags= #PriorityType=priority/basic #PriorityDecayHalfLife= #PriorityCalcPeriod= #PriorityFavorSmall= #PriorityMaxAge= #PriorityUsageResetPeriod= #PriorityWeightAge= #PriorityWeightFairshare= #PriorityWeightJobSize= #PriorityWeightPartition= #PriorityWeightQOS= # # # LOGGING AND ACCOUNTING #AccountingStorageEnforce=0 #AccountingStorageHost= #AccountingStoragePass= #AccountingStoragePort= AccountingStorageType=accounting_storage/none #AccountingStorageUser= AccountingStoreJobComment=YES ClusterName=cluster #DebugFlags= #JobCompHost= #JobCompLoc= #JobCompPass= #JobCompPort= JobCompType=jobcomp/none #JobCompUser= #JobContainerType=job_container/none JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/none SlurmctldDebug=info #SlurmctldLogFile= SlurmdDebug=info #SlurmdLogFile= #SlurmSchedLogFile= #SlurmSchedLogLevel= # # # POWER SAVE SUPPORT FOR IDLE NODES (optional) #SuspendProgram= #ResumeProgram= #SuspendTimeout= #ResumeTimeout= #ResumeRate= #SuspendExcNodes= #SuspendExcParts= #SuspendRate= #SuspendTime= # # # COMPUTE NODES NodeName=localhost CPUs=12 Sockets=1 CoresPerSocket=6 ThreadsPerCore=2 State=UNKNOWN PartitionName=debug Nodes=localhost Default=YES MaxTime=INFINITE State=UP ###################################################################################################### ###################################################################################################### ######################################################################################################