tests/backtrace-dwarf.c failure due to -freorder-blocks-and-partition

2018-08-03 Thread Martin Liška
Hello.

As slightly discussed with Mark, there are tests that expect 'main'
will be present in backtrace. That's not always true on x86_64 because
-freorder-blocks-and-partition option is on by default. Then one can see:

[   88s] FAIL: run-backtrace-dwarf.sh
[   88s] 
[   88s] 
[   88s] 0x7f1fd49800cb raise
[   88s] 0x7f1fd49694e9 abort
[   88s] 0x5627fddd0188 callme
[   88s] 0x5627fddd0192 doit
[   88s] 0x5627fddd01a3 main.cold.1
[   88s] 0x7f1fd496afeb __libc_start_main
[   88s] 0x5627fddd04aa _start
[   88s] /home/abuild/rpmbuild/BUILD/elfutils-0.173/tests/backtrace-dwarf: 
dwfl_thread_getframes: no error
[   88s] 0x5627fddd01a3 main.cold.1

Thus I'm suggesting to disable the option for tests?
Thoughts?
Martin


Re: tests/backtrace-dwarf.c failure due to -freorder-blocks-and-partition

2018-08-03 Thread Mark Wielaard
Hi Martin,

On Fri, 2018-08-03 at 09:41 +0200, Martin Liška wrote:
> As slightly discussed with Mark, there are tests that expect 'main'
> will be present in backtrace. That's not always true on x86_64
> because
> -freorder-blocks-and-partition option is on by default. Then one can
> see:
> 
> [   88s] FAIL: run-backtrace-dwarf.sh
> [   88s] 
> [   88s] 
> [   88s] 0x7f1fd49800cb   raise
> [   88s] 0x7f1fd49694e9   abort
> [   88s] 0x5627fddd0188   callme
> [   88s] 0x5627fddd0192   doit
> [   88s] 0x5627fddd01a3   main.cold.1
> [   88s] 0x7f1fd496afeb   __libc_start_main
> [   88s] 0x5627fddd04aa   _start
> [   88s] /home/abuild/rpmbuild/BUILD/elfutils-0.173/tests/backtrace-
> dwarf: dwfl_thread_getframes: no error
> [   88s] 0x5627fddd01a3   main.cold.1
> 
> Thus I'm suggesting to disable the option for tests?
> Thoughts?

So the problem is that some tests look for a 'main' symbol.
This is imho for C based programs a natural way to see if we can unwind
to the start of the program (everything before 'main' is infrastructure
that isn't really relevant to the user). But in some cases the 'main'
symbol is munged into something else. 'main.cold.1' in this case.

The first question is, does the program also contain a 'main' symbol?
If so, what does it cover?
Could you eu-readelf -s tests/backtrace-dwarf | grep main

Now if it does, the question is why didn't we see it?
Is main.cold.1 an alias? Then we probably should look harder/smarter.
Or does it now cover any of the backtrace addresses?

If there isn't, or it isn't actually called, then the question is, is
that actually legal? It seems, at least for C and C++ based programs
that they should start in 'main'. If not they are not, is that because
gcc did an illegal transformation? Or does it only look that way
because we cannot unwind correctly (did it do some tail call)?

We could just use -freorder-blocks-and-partition. But I would like to
first really understand why it is necessary.

If you could maybe post the binary somewhere for inspection that would
be great.

Thanks,

Mark


Re: tests/backtrace-dwarf.c failure due to -freorder-blocks-and-partition

2018-08-03 Thread Martin Liška
On 08/03/2018 11:46 AM, Mark Wielaard wrote:
> Hi Martin,
> 
> On Fri, 2018-08-03 at 09:41 +0200, Martin Liška wrote:
>> As slightly discussed with Mark, there are tests that expect 'main'
>> will be present in backtrace. That's not always true on x86_64
>> because
>> -freorder-blocks-and-partition option is on by default. Then one can
>> see:
>>
>> [   88s] FAIL: run-backtrace-dwarf.sh
>> [   88s] 
>> [   88s] 
>> [   88s] 0x7f1fd49800cb  raise
>> [   88s] 0x7f1fd49694e9  abort
>> [   88s] 0x5627fddd0188  callme
>> [   88s] 0x5627fddd0192  doit
>> [   88s] 0x5627fddd01a3  main.cold.1
>> [   88s] 0x7f1fd496afeb  __libc_start_main
>> [   88s] 0x5627fddd04aa  _start
>> [   88s] /home/abuild/rpmbuild/BUILD/elfutils-0.173/tests/backtrace-
>> dwarf: dwfl_thread_getframes: no error
>> [   88s] 0x5627fddd01a3  main.cold.1
>>
>> Thus I'm suggesting to disable the option for tests?
>> Thoughts?
> 
> So the problem is that some tests look for a 'main' symbol.
> This is imho for C based programs a natural way to see if we can unwind
> to the start of the program (everything before 'main' is infrastructure
> that isn't really relevant to the user). But in some cases the 'main'
> symbol is munged into something else. 'main.cold.1' in this case.
> 
> The first question is, does the program also contain a 'main' symbol?
> If so, what does it cover?
> Could you eu-readelf -s tests/backtrace-dwarf | grep main

Yes it does, can be shown with gcc 8.* on x86_64:

$ cat cold.c
int main(int argc, char **argv)
{
  if (argc != 111)
__builtin_abort ();

  return 0;
}

$ gcc cold.c -O2 
$ readelf -s a.out | grep main
 2:  0 FUNCGLOBAL DEFAULT  UND 
__libc_start_main@GLIBC_2.2.5 (2)
37: 00400430 5 FUNCLOCAL  DEFAULT   14 main.cold.0
60:  0 FUNCGLOBAL DEFAULT  UND 
__libc_start_main@@GLIBC_
69: 0040044020 FUNCGLOBAL DEFAULT   14 main

$ gdb ./a.out 
r
Starting program: /tmp/a.out 

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50return ret;
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x77a384e9 in __GI_abort () at abort.c:79
#2  0x00400435 in main.cold ()
#3  0x77a39feb in __libc_start_main (main=0x400440 , argc=1, 
argv=0x7fffdc88, init=, fini=, 
rtld_fini=, stack_end=0x7fffdc78) at ../csu/libc-start.c:308
#4  0x0040048a in _start () at ../sysdeps/x86_64/start.S:120

If using debug info (-g), then it's fine:

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50return ret;
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x77a384e9 in __GI_abort () at abort.c:79
#2  0x00400435 in main (argc=, argv=) at 
cold.c:4
#3  0x77a39feb in __libc_start_main (main=0x400440 , argc=1, 
argv=0x7fffdc88, init=, fini=, 
rtld_fini=, stack_end=0x7fffdc78) at ../csu/libc-start.c:308
#4  0x0040048a in _start () at ../sysdeps/x86_64/start.S:120

> 
> Now if it does, the question is why didn't we see it?
> Is main.cold.1 an alias? Then we probably should look harder/smarter.
> Or does it now cover any of the backtrace addresses?

Maybe because jmp instruction is used instead of call?

00400440 :
  400440:   48 83 ec 08 sub$0x8,%rsp
  400444:   83 ff 6fcmp$0x6f,%edi
  400447:   0f 85 e3 ff ff ff   jne400430 
  40044d:   31 c0   xor%eax,%eax
  40044f:   48 83 c4 08 add$0x8,%rsp
  400453:   c3  retq   
  400454:   66 2e 0f 1f 84 00 00nopw   %cs:0x0(%rax,%rax,1)
  40045b:   00 00 00 
  40045e:   66 90   xchg   %ax,%ax

I'm not expert in libbactrace, so maybe true is somewhere else.

Martin

> 
> If there isn't, or it isn't actually called, then the question is, is
> that actually legal? It seems, at least for C and C++ based programs
> that they should start in 'main'. If not they are not, is that because
> gcc did an illegal transformation? Or does it only look that way
> because we cannot unwind correctly (did it do some tail call)?
> 
> We could just use -freorder-blocks-and-partition. But I would like to
> first really understand why it is necessary.
> 
> If you could maybe post the binary somewhere for inspection that would
> be great.
> 
> Thanks,
> 
> Mark
>