https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118553

Jianrong Zhao <silverzhaojr at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |silverzhaojr at gmail dot com

--- Comment #6 from Jianrong Zhao <silverzhaojr at gmail dot com> ---
We met the issue in the production code recently, and I figured out the root
cause.

### Analysis

In gcov build, all the "execXXX()" syscalls are instrumented by gcc to dump the
gcov coverage data in child process before it calls "exec()" to load the real
executable binary.

e.g.,the syscall "execv()" is instrumented as "__gcov_execv()" in gcov build,
like:

//========================================

//
https://github.com/gcc-mirror/gcc/blob/releases/gcc-14/libgcc/libgcov-interface.c#L304

/* A wrapper for the execv function.  Flushes the accumulated
   profiling data, so that they are not lost.  */

int
__gcov_execv (const char *path, char *const argv[])
{
  /* Dump counters only, they will be lost after exec.  */
  __gcov_dump ();
  int ret = execv (path, argv);
  /* We reach this code only when execv fails, reset counter then here.  */
  __gcov_reset ();
  return ret;
}

//========================================

gcov has an internal flag to indicate whether the gcov data has been dumped or
not, and after executing "__gcov_dump()" the flag is set to true.

However, the child process created by "vfork()" shares the memory space of the
parent process, so actually "__gcov_dump()" modifies the gcov internal flag of
the parent process!

When parent process exits normally, it calls the exit handler to dump gcov
data. However, since the dump flag has already been set to true by child
process before, it just skips the dump operation, and all the coverage data
after the "vfork()" are dropped in the parent process.

### Solution

It's simple to fix the issue, just call "__gcov_reset()" right after
"__gcov_dump()" in the instrumented "exec()" code, which will reset the gcov
internal variables, this works for both "fork()" and "vfork()", like:

//========================================

int
__gcov_execv (const char *path, char *const argv[])
{
  __gcov_dump ();
  __gcov_reset ();               // <== HERE
  int ret = execv (path, argv);
  return ret;
}

//========================================

### Regression

Looks like it's a regression introduced by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93623 . In that change,
"__gcov_flush()" is removed, which is the combination call of "__gcov_dump()"
and "__gcov_reset()".

The change is submitted since gcc 11, that's why the old version like gcc 10 is
not affected.

### Workaround

Before the issue is fixed, we have a workaround for it:

we can call "__gcov_reset()" in the parent process right after "vfork()" +
"exec()", in such way the gcov internal variables can be reset in the parent
process and the coverage data for parent process can be dumped successfully,
like:

//========================================

int
main(void)
{
    pid_t pid = vfork();
    switch (pid) {
    case 0:
        execl("/bin/sh", "sh", "-c", ":", (const char *)0);
        /* FALLTHROUGH */
    case -1:
        write(2, "error\n", 6);
        _exit(1);
    }

    // we have to call __gcov_reset() in the parent process
    // to reset the gcov internal variables as a workaround
    // for "vfork()" + "exec()" in gcov build
    __gcov_reset();              // <== HERE

    write(1, "reached\n", 8);
    return 0;
}

//========================================

Reply via email to