Hello,

I am trying to use a cpu profiling tool (google perftool) which uses libunwind 
to get backtraces.  The code that is being profiled takes mutex locks all over 
the place.  When the profile is run, it crashes instantaneously (generally with 
some kind of illegal instruction).   See an example of crash below.

myhost# gdb /root/asp/bin/myexec core_7144_1327624664_myprogram
GNU gdb (GDB) 7.3.1
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "mips64-nlm-linux".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /root/asp/bin/myexec...done.

warning: core file may not match specified executable file.
...
[Thread debugging using libthread_db enabled]
Core was generated by `/root/asp/bin/myexec'.
Program terminated with signal 4, Illegal instruction.
#0  0x0000005556c9d3bc in __sigprocmask (how=3, set=0x5589f15bf8, oset=0x0) at 
../sysdeps/unix/sysv/linux/sigprocmask.c:66
66           ../sysdeps/unix/sysv/linux/sigprocmask.c: No such file or 
directory.
                in ../sysdeps/unix/sysv/linux/sigprocmask.c
(gdb) bt
#0  0x0000005556c9d3bc in __sigprocmask (how=3, set=0x5589f15bf8, oset=0x0) at 
../sysdeps/unix/sysv/linux/sigprocmask.c:66
#1  0x000000555783cf10 in put_rs_cache () from 
/anroot/projects/tos_3party/.target/mips64-nlm-linux/lib/libunwind.so.8
#2  0x000000555783dfb4 in _ULmips_dwarf_find_save_locs () from 
/anroot/projects/tos_3party/.target/mips64-nlm-linux/lib/libunwind.so.8
#3  0x000000555783ecc8 in _ULmips_dwarf_step () from 
/anroot/projects/tos_3party/.target/mips64-nlm-linux/lib/libunwind.so.8
#4  0x0000005557837084 in _ULmips_step () from 
/anroot/projects/tos_3party/.target/mips64-nlm-linux/lib/libunwind.so.8
#5  0x00000055556ae04c in GetStackTraceWithContext(void**, int, int, void 
const*) () from /opt/thoroughbred/lib/libtcmalloc.so.0
#6  0x00000055557a6ce4 in ?? () from /opt/thoroughbred/lib/libprofiler.so.0
#7  0x00000055557a90e8 in ProfileHandler::SignalHandler(int, siginfo*, void*) 
() from /opt/thoroughbred/lib/libprofiler.so.0
#8  <signal handler called>
#9  0x000000555711cc44 in __lll_trylock (futex=<optimized out>) at 
../ports/sysdeps/unix/sysv/linux/mips/nptl/lowlevellock.h:137
#10 __pthread_mutex_trylock (mutex=0x555ecda230) at pthread_mutex_trylock.c:65
...
#16 0x00000055571198c8 in start_thread (arg=<optimized out>) at 
pthread_create.c:299
#17 0x0000005556d50bbc in __thread_start () from /opt/thoroughbred/lib/libc.so.6


The Google Perftool README recognizes this problem.  Here is what it says.
... while tcmalloc itself works fine, the
cpu-profiler tool is unreliable: it will sometimes work, but sometimes
cause a segfault.  I'll explain the problem first, and then some
workarounds.

Note that this only affects the cpu-profiler, which is a
google-perftools feature you must turn on manually by setting the
CPUPROFILE environment variable.  If you do not turn on cpu-profiling,
you shouldn't see any crashes due to perftools.

The gory details: The underlying problem is in the backtrace()
function, which is a built-in function in libc.
Backtracing is fairly straightforward in the normal case, but can run
into problems when having to backtrace across a signal frame.
Unfortunately, the cpu-profiler uses signals in order to register a
profiling event, so every backtrace that the profiler does crosses a
signal frame.

In our experience, the only time there is trouble is when the signal
fires in the middle of pthread_mutex_lock.  pthread_mutex_lock is
called quite a bit from system libraries, particularly at program
startup and when creating a new thread.

The solution: The dwarf debugging format has support for 'cfi
annotations', which make it easy to recognize a signal frame.  Some OS
distributions, such as Fedora and gentoo 2007.0, already have added
cfi annotations to their libc.  A future version of libunwind should
recognize these annotations; these systems should not see any
crashses.

Why does libunwind choke if a signal to do profiling fires in middle of 
pthread_mutex_lock?  I am also not clear on the solution that gperf offers, can 
someone please advise me further on that?

Regards,
Sid
_______________________________________________
Libunwind-devel mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/libunwind-devel

Reply via email to