Here's a program with a couple of problems.
It runs three concurrent child processes, and measures the resource usage
for each of them separately. I'm using a dummy child which is /bin/sh -c
"yes >/dev/null", and let it run for a few seconds before forcibly
terminating it.
package main
import (
"context"
"fmt"
"os/exec"
"syscall"
"time"
)
func child(n int, done chan int) {
defer func() { done <- 0 }()
ctx, cancel := context.WithTimeout(context.Background(),
time.Duration(n)*time.Second)
defer cancel()
cmd := exec.CommandContext(ctx, "/bin/sh", "-c", "yes >/dev/null")
err := cmd.Run()
if err != nil {
fmt.Printf("%d Run(): %v\n", n, err)
}
if cmd.ProcessState == nil {
fmt.Printf("%d nil ProcessState", n)
return
}
if rusage, ok := cmd.ProcessState.SysUsage().(*syscall.Rusage); ok {
fmt.Printf("rusage %d: Utime=%v, Stime=%v, Maxrss=%v\n", n, rusage.Utime,
rusage.Stime, rusage.Maxrss)
} else {
fmt.Printf("%d no rusage\n", n)
}
}
func main() {
done := make(chan int)
go child(4, done)
go child(1, done)
go child(2, done)
<-done
<-done
<-done
fmt.Println("Bye!")
}
*Problem 1*: when the context timeout expires, the shell is killed, but its
descendant process ("yes") isn't. This leaves three orphaned "yes"
processes running, burning all CPU on your machine, which have to be
manually found and killed. (Aside: that's why I didn't want to post it on
play.golang.org, although I expect it has strong protections against this
sort of thing)
When a context timeout occurs, it's ambiguous in the documentation
<https://golang.org/pkg/os/#Process.Kill> whether Process.Kill sends a
SIGTERM or a SIGKILL (since "kill" is both the name of the syscall and the
name of a signal). Looking at the implementation
<https://github.com/golang/go/blob/master/src/os/exec_posix.go#L65>, it
appears to send SIGKILL, which means that there's no opportunity for the
process to kill its descendants.
I'm not sure what the right solution is here, but I think it's something
about sending a signal to a process group (-pid) rather than a single
process, which could be done if the child runs in its own progress group
(setpgid? setsid?)
*Problem 2*: the Utime/Stime CPU usage printed is very low. I believe it's
showing me the resource usage for the parent shell, but not the child "yes"
process. I'd like to have the resource usage for the subprocess *and* its
descendants.
As far as I can see, the usage comes from wait4() here:
https://github.com/golang/go/blob/master/src/os/exec_unix.go#L43. The
manpage for wait4 says:
If rusage is not NULL, the struct rusage to which it points will
be filled with accounting information about the child.
See getrusage(2) for details.
However it doesn't say if it uses RUSAGE_CHILDREN or RUSAGE_SELF,
which getrusage() lets you specify. A bit of Googling turns up that some
systems have a wait6
<http://manpages.ubuntu.com/manpages/xenial/man2/waitpid.2freebsd.html>
which returns both forms of usage.
Although Go lets me call Getrusage()
<https://golang.org/pkg/syscall/#Getrusage> directly, this isn't much use
if there are multiple concurrent children. And as far as i can see, Go
doesn't let me fork() my own child explicitly so I could measure its
descendants separately.
Right now I'm thinking I'll have to invoke a wrapper binary, e.g.
exec.CommandContext(ctx, "measure_resource", "real_program", "arg1", "arg2")
where "measure_resource" calls Getrusage(RUSAGE_CHILDREN) and writes it to
stderr just before terminating, and the parent extracts this from stderr.
It could also apply its own session with setsid, and/or implement a softer
timeout than the hard SIGKILL that exec.CommandContext() generates.
Can anyone think of a cleaner solution to this?
Many thanks,
Brian.
--
You received this message because you are subscribed to the Google Groups
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/golang-nuts/1a6dda12-b66f-4297-b229-08b417b5c5d7o%40googlegroups.com.