See: https://github.com/SchedMD/slurm/blob/master/src/slurmd/slurmstepd/mgr.c
Circa line 1072 the comment explains: /* * Need to exec() something for proctrack/linuxproc to * work, it will not keep a process named "slurmstepd" */ execl(SLEEP_CMD, "sleep", "100000000", NULL); Basically, proctrack/linuxproc will produce an error if a slurmstepd is running zero subprocesses. So a very long sleep command is spawned to satisfy that condition (no matter what proctrack plugin is actually being used). > On Aug 3, 2018, at 17:42 , Christopher Benjamin Coffey <chris.cof...@nau.edu> > wrote: > > Hello, > > Has anyone observed "sleep 100000000" processes on their compute nodes? They > seem to be tied to the slurmstepd extern process in slurm: > > 4 S root 136777 1 0 80 0 - 73218 do_wai 05:48 ? 00:00:01 > slurmstepd: [13220317.extern] > 0 S root 136782 136777 0 80 0 - 25229 hrtime 05:48 ? 00:00:00 > \_ sleep 100000000 > 4 S root 136784 1 0 80 0 - 73280 do_wai 05:48 ? 00:00:02 > slurmstepd: [13220317.batch] > 4 S tes87 136789 136784 0 80 0 - 26520 do_wai 05:48 ? 00:00:00 > \_ /bin/bash /var/spool/slurm/slurmd/job13220317/slurm_script > 4 S root 136807 1 0 80 0 - 107157 do_wai 05:48 ? 00:00:01 > slurmstepd: [13220317.1] > > I'm not exactly sure what the extern piece is for. Anyone know what this is > all about? Is this normal? We just saw this the other day while investigating > some issues. Sleeping for 3.17 years seems strange. Any help would be > appreciated, thanks! > > Best, > Chris > > — > Christopher Coffey > High-Performance Computing > Northern Arizona University > 928-523-1167 > >