Bug#851379: dietlibc FTBFS on arm64: Bus error when running tst-calloc.c

Christian Seiler Fri, 27 Jan 2017 04:22:01 -0800

On 01/26/2017 10:17 PM, Thorsten Glaser wrote:
> Christian Seiler dixit:
>> -g, and generate a backtrace? That might already help me to figure
>> out what's going on...
> 
> Recursive calls; the SIGBUS is likely a stack underflow.


So it looked at this yesterday evening and I couldn't really make heads
or tails from the backtrace, so let's try this again with a fresh pair
of eyes.

I would expect stack exhaustion to cause a SIGSEGV, but not a SIGBUS.
Example:

void foo()
{
  char a[8192];
  a[0] = '\0';
  // Prevent tail-recursion optimization by the compiler:
  void (* volatile bar)() = &foo;
  bar();
  (void) a;
}

int main()
{
  foo();
  return 0;
}

This segfaults, but doesn't generate SIGBUS.

> (gdb) set pagination off
> (gdb) r
> Starting program: /home/tg/dietlibc-0.34~cvs20160606/debian/unittests/ttt
> [Inferior 1 (process 29556) exited normally]
> (gdb) r
> Starting program: /home/tg/dietlibc-0.34~cvs20160606/debian/unittests/ttt
> 
> Program received signal SIGBUS, Bus error.
> 0x0000000000401740 in __testandset ()
> (gdb) bt
> #0  0x0000000000401740 in __testandset ()
> #1  0x00000000004016a0 in __pthread_lock ()
> #2  0x00000000004016a0 in __pthread_lock ()
> #3  0x00000000004016a0 in __pthread_lock ()
> #4  0x00000000004016a0 in __pthread_lock ()
> #5  0x00000000004016a0 in __pthread_lock ()

I don't think this is stack exhaustion; I think the stack frame
is being corrupted here and that's why gdb can't figure out the
proper stack frame.

> (sid_arm64-dchroot)tg@asachi:~/dietlibc-0.34~cvs20160606$ gdb 
> debian/unittests/ttt
> Breakpoint 1, 0x0000000000401674 in __pthread_lock ()
> (gdb) bt
> #0  0x0000000000401674 in __pthread_lock ()
> #1  0x0000000000400858 in __thread_find_ ()
> #2  0x0000000000400894 in __thread_self ()
> #3  0x000000000040059c in malloc ()
> #4  0x000000000040059c in malloc ()
> #5  0x000000000040059c in malloc ()
> #6  0x000000000040059c in malloc ()
> #7  0x000000000040059c in malloc ()
> […]

This would also indicate stack frame corruption (and hence gdb
being unable to properly trace this), because malloc() (see
libpthread/pthread_sys_alloc.c) does _not_ call itself directly.

> Your debian/patches/bugfixes/thread-self-vs-tcb.diff replaces the inline
> assembly implementation of __thread_self with one ending up recursively
> calling a chain malloc → __thread_self → __thread_find_ → __pthread_lock
> probably because it uses some structure that needs to be malloc(3)ed to
> work, but is needed for malloc(3) to function.

No, because __thread_self and __thread_find_ and __pthread_lock
never call malloc().

If that were the case, you'd see that loop in the stack trace,
and not just the same function repeated over and over again.

Whether this is a stack exhaustion or not can easily be seen:

Print the current stack pointer in gdb:
    print $sp

Look at /proc/$PID/maps to see in which range the stack resides.
I'd be surprised if $sp was even close to the lower end of that
address range.

Unfortunately, if the stack frame really is corrupted, I'd really
need to look at this on the porterbox directly (because I don't
think it's a productive use of both of our time to do this via
email), so I'll have to wait until I get my account.

Regards,
Christian

Bug#851379: dietlibc FTBFS on arm64: Bus error when running tst-calloc.c

Reply via email to