reassign 723982 tulip thanks On Sat, Feb 01, 2014 at 02:27:49PM +0100, Yann Dirson wrote: > [resend with bugs CC'd] > > Hello, > > Context: > > http://bugs.debian.org/734318 - tulip: [amd64] segfaults inside dlopen when > loading plugins > http://bugs.debian.org/723982 - dlopen: segfaults right inside call_init > > What we get here is a number of plugins that when dlopen'd cause an > obscure segfault inside libc code. Upstream (CC'd) say they have > heard of such problems (on Ubuntu 13.10), that people have worked > around by downgrading the compiler. > > This sounds like either a toolchain regression, or possibly some > edge-case that worked by chance with old compilers and now fail.
This is exactly that the bug is in tulip and up to know it worked only by chance on x86_64. The segfault occurs in dl-init.c when call_init is calling all the init functions from DT_INIT_ARRAY. This is done in C by this code: | addrs = (ElfW(Addr) *) (init_array->d_un.d_ptr + l->l_addr); | for (j = 0; j < jm; ++j) | ((init_t) addrs[j]) (argc, argv, env); which is translated in assembly code into: | 0x00007ffff7deb926 <+134>: lea 0x8(%rbx,%rax,8),%r14 | 0x00007ffff7deb92b <+139>: nopl 0x0(%rax,%rax,1) | 0x00007ffff7deb930 <+144>: mov %r13,%rdx | 0x00007ffff7deb933 <+147>: mov %r12,%rsi | 0x00007ffff7deb936 <+150>: mov %ebp,%edi | 0x00007ffff7deb938 <+152>: callq *(%rbx) | 0x00007ffff7deb93a <+154>: add $0x8,%rbx | 0x00007ffff7deb93e <+158>: cmp %r14,%rbx | 0x00007ffff7deb941 <+161>: jne 0x7ffff7deb930 <call_init+144> | 0x00007ffff7deb943 <+163>: pop %rbx | 0x00007ffff7deb944 <+164>: pop %rbp | 0x00007ffff7deb945 <+165>: pop %r12 | 0x00007ffff7deb947 <+167>: pop %r13 | 0x00007ffff7deb949 <+169>: pop %r14 | 0x00007ffff7deb94b <+171>: retq As you can see the value of addrs is stored in %rbx and is incremented by 8 at each loop. The segfault occurs at address 0x00007ffff7deb938 when trying to dereference %rbx. When it happens, %rbx has its upper 32 bits clobbered and thus point to the lower 32-bit of addrs[j]. Tracing that with GDB, it appeared %rbx is clobbered in the System::init constructor from tulip. This code probes among other things uses the CPUID instruction using assembly code: | __asm__ __volatile__ ("xchgl %%ebx,%0\n\t" | "cpuid \n\t" | "xchgl %%ebx,%0\n\t" | : "+r" (b), "=a" (a), "=c" (c), "=d" (d) | : "1" (infoType), "2" (c)); As you can see %ebx is saved with xchgl before the %cpuid instruction and restored after the same way. While that works correctly on x86, on x86_64 the 32 upper bits get zeroed. BOOM ! I would suggest to use <cpuid.h> (which is available since GCC 4.4) instead of this buggy assembly code to probe the CPU. In the meantime I am reassigning the bug to tulip. Aurelien -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurel...@aurel32.net http://www.aurel32.net -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org