https://bugs.kde.org/show_bug.cgi?id=415141

            Bug ID: 415141
           Summary: Possible leak with calling __libc_freeres before all
                    thread's tid_addresses are cleared
           Product: valgrind
           Version: unspecified
          Platform: Other
                OS: Linux
            Status: REPORTED
          Severity: normal
          Priority: NOR
         Component: general
          Assignee: jsew...@acm.org
          Reporter: m...@klomp.org
  Target Milestone: ---

This was reported against bugzilla.redhat.com:
https://bugzilla.redhat.com/show_bug.cgi?id=1596537

The relevant comments are:

> > Whenever a thread exits (gets to a point where valgrind knows it will fall
> > off the end of the execution path) it checks if there are any other threads
> > still alive, if there are no others alive it will tear down everything,
> > including running __libc_freeres. Otherwise it simply does an (real) _exit
> > syscall.
> 
> So there there is a race here then. Valgrind needs to wait for the kernel to
> process all CLONE_CHILD_CLEARTID requests or __libc_freeres may not cleanup
> all the thread stacks.
> 
> > When valgrind sees an exit syscall it will tell the thread itself to exit,
> > if it sees an exit_group syscall it will tell all threads to exit. Which
> > will then trigger the above "the last thread alive will call __libc_freeres"
> > as explained above.
> 
> ... and wait for all tid's to be cleared by the kernel before calling
> __libc_freeres.  There may be more things in the future hooked on that tid
> being cleared by the kernel upon thread death.

OK, so from http://man7.org/linux/man-pages/man2/clone.2.html

CLONE_CHILD_CLEARTID (since Linux 2.5.49)
              Clear (zero) the child thread ID at the location ctid in child
              memory when the child exits, and do a wakeup on the futex at
              that address.  The address involved may be changed by the
              set_tid_address(2) system call.  This is used by threading
              libraries.

That is certainly interesting. valgrind indeed does nothing special with the
memory pointed to at ctid, except note whether or not it is defined
before/after a clone syscall or set_tid_address. The ctid is called
child_tidptr (pointed to by ARG_CHILD_TIDPTR) in
coregrind/m_syswrap/syswrap-linux.c. This is actually given to
start_thread_NORETURN, but not used and not passed to run_a_thread_NORETURN
(probably because main_thread_wrapper_NORETURN cannot pass it on).

The issue is that that address is not owned by valgrind. And it might be hard
to wrap and emulate it, given that we would then have to also emulate doing the
full futex dance on it. 

And when we are at the point of calling __libc_freeres we won't allow any other
thread to run anymore (since we really believe they are dead an gone and we are
the only life thread).

If we could fix the tracking (see the discrepancy between
main_thread_wrapper_NORETURN and start_thread_NORETURN above), then we might
forcibly clear the memory for each thread that has setup ctid just before
calling __libc_freeres to indicate all threads are really and truly dead.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to