Source: glibc Version: 2.19-7 Severity: important User: debian-al...@lists.debian.org Usertags: alpha Justification: Fails to build from source but built in the past.
The test tst-eintr3 sometimes fails in the build of glibc on alpha and has done so twice in a row in attempting to build 2.19-7. It's an intermittant fault that appears to only occur on a multiprocessor SMP system (which the buildd imago is). Running the test manually 40 or so times never failed when running a UP kernel. To make testing faster I have used upstream glibc source on the 2.19 branch configuring with --enable-hardcoded-path-in-tests and running tst-eintr3 with the --direct option. It occasionally segfaults. Getting a core dump and analysing with gdb gives the following: Core was generated by `/home/mjc/toolchain/glibc-build/nptl/tst-eintr3 --direct'. Program terminated with signal SIGSEGV, Segmentation fault. #0 start_thread (arg=0x2000121f1f0) at pthread_create.c:243 243 __resp = &pd->res; (gdb) bt full #0 start_thread (arg=0x2000121f1f0) at pthread_create.c:243 pd = 0x2000121f1f0 unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0 <repeats 17 times>}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x2000003da00 <start_thread>, 0x2000121f1f0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 252416}}} not_first_call = <optimized out> robust = <optimized out> pagesize_m1 = <optimized out> sp = <optimized out> freesize = <optimized out> __PRETTY_FUNCTION__ = "start_thread" #1 0x0000020000177d24 in thread_start () at ../ports/sysdeps/unix/sysv/linux/alpha/clone.S:111 No locals. (gdb) disass /m Dump of assembler code for function start_thread: 232 { 0x000002000003da00 <+0>: ldah gp,3(t12) 0x000002000003da04 <+4>: lda gp,-14800(gp) 0x000002000003da08 <+8>: lda sp,-240(sp) 0x000002000003da14 <+20>: stq fp,40(sp) 0x000002000003da18 <+24>: mov sp,fp 0x000002000003da24 <+36>: stq s0,8(sp) 0x000002000003da28 <+40>: stq ra,0(sp) 0x000002000003da30 <+48>: stq s1,16(sp) 0x000002000003da38 <+56>: stq s2,24(sp) 0x000002000003da3c <+60>: stq s3,32(sp) 0x000002000003da40 <+64>: stq a0,224(fp) 233 struct pthread *pd = (struct pthread *) arg; 234 235 #if HP_TIMING_AVAIL 236 /* Remember the time when the thread was started. */ 237 hp_timing_t now; 238 HP_TIMING_NOW (now); 239 THREAD_SETMEM (pd, cpuclock_offset, now); 240 #endif 241 242 /* Initialize resolver state pointer. */ 243 __resp = &pd->res; 0x000002000003da0c <+12>: rduniq 0x000002000003da10 <+16>: ldq t0,-32656(gp) 0x000002000003da20 <+32>: addq v0,t0,t0 0x000002000003da2c <+44>: lda t1,1208(a0) 0x000002000003da34 <+52>: mov v0,s0 => 0x000002000003da44 <+68>: stq t1,0(t0) The __resp variable appears to be a thread local variable being accessed (well, written) by the initial exec TLS model. The rduniq PALcall should put the thread pointer (from the PCB) into register v0. Now let's check the address being written to at the point of the segfault. (gdb) print /x $t0 $1 = 0x18 That's definitely not a valid memory location since the first page of memory starting at location 0 should be inaccessible. Checking the thread pointer: (gdb) print /x $v0 $2 = 0x0 Ouch! That looks like the thread pointer in the PCB has not been initialised. Running tst-eintr3 under gdb and setting a break point on line 243 reveals that, in general, the rduniq PALcall does return a valid memory address (and presumably correctly the thread pointer), but, occassionaly on an SMP system, it can return 0. This is as far as I have got with debugging. Presumably there is a wruniq PALcode call somewhere that sets up the thread pointer in the PCB and that might be the next place to investigate what is going on. Cheers Michael. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org