On Wed, Mar 28, 2001 at 04:45:50PM -0500, Roland McGrath wrote: > > I wouldn't know how to get it, so I don't know if I can. What do I need for > > this? > > Does ddb work these days? Last time I did kernel hacking it was > oskit-mach, and that dumps a stack trace when it panics. I don't know, I never used ddb. > > If it isn't "wire" I am looking for, I don't know what I am looking for (a > > grep showed nothing in proc/). > > You are right. proc used to wire itself (wire_task_self), but it doesn't > now (init does). So this kernel bug is of more concern than I thought. I should mention something. I attached two gdbs, and exited the first one before the second. (I didn't clear the suspend count when starting the first, and it didn't ask me for the suspend count when exiting, as it would in another session I tried). So this might be related to gdb mayhem. I don't know if running two gdbs is fine (it shouldn't crash the kernel, but...). Anyway, I sticked with one gdb only this time and it didn't crash. The subhurd reported that it can't emulate the crash and would reboot the Hurd now, after exiting gdb. So the kernel panic thread_invoke is either a random crash or a side effect of the two gdbs (would need to do more testing to find out. Reproducing the crash takes about one hours, so I'd like to avoid that). > > Sometimes I wonder if the kernel ring buffer proposed by RMS wouldn't be > > helpful in situations like this. > > Well, maybe. But it is a lot of overhead. I'd be more inclined to work > on a way to make it possible to trace a sub-hurd using rpctrace on > the parent hurd. Ok, sounds fine, too. I have reproduced exactly the crash Jeff reported. I have collected the data. I used a ring buffer of 16 entries (can increase if needed), and the full gdb log is attached. Here are the three ports on which RPCs where logged immediately before the crash (in interleaved order, see left column). If a field is blank, it is the same as the previous one in the same column: port 218: real- order bits size seqno id 1. 2147488018 32 1246 24021 dostop 2. 1247 24031 task2proc 3. 1248 24031 5. 1249 24018 get_arg_locations 7. 1250 24030 task2pid 8. 1251 24012 child port 229: order bits size seqno id 4. 2147488018 32 0 24013 setmsgport 6. 4370 40 1 24017 set_arg_locations 9. 24 2 24016 getpids 10. 2147488018 120 3 24022 handle_exceptions 11. 32 4 24021 dostop 12. 5 24031 task2proc 13. 6 24031 15. 4370 24 7 24018 get_arg_locations port 279: order bits size seqno id 14. 2147488018 32 0 24013 setmsgport 16. 4370 40 1 24017 set_arg_locations *** crash *** Of course, one data point is not very much. I can run this a few more times, and we can see if a pattern emerges. We can insert assertions etc. We can probably log whole messages. Can we run proc single threaded, so that we know where exactly it crashed? Thanks, Marcus -- `Rhubarb is no Egyptian god.' Debian http://www.debian.org [EMAIL PROTECTED] Marcus Brinkmann GNU http://www.gnu.org [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.marcus-brinkmann.de
Script started on Thu Mar 29 00:30:37 2001 hurd:~# gdb /proc.exe 86 GNU gdb 5.0 Copyright 2000 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-gnu0.2"... /root/86: No such file or directory. Attaching to program `/proc.exe', pid 86 warning: Can't modify tracing state for pid 86: No signal thread Reading symbols from /lib/libhurdbugaddr.so.0.2...done. Loaded symbols for /lib/libhurdbugaddr.so.0.2 Reading symbols from /lib/libthreads.so.0.2...done. Loaded symbols for /lib/libthreads.so.0.2 Reading symbols from /lib/libihash.so.0.2...done. Loaded symbols for /lib/libihash.so.0.2 Reading symbols from /lib/libports.so.0.2...done. Loaded symbols for /lib/libports.so.0.2 Reading symbols from /lib/libshouldbeinlibc.so.0.2...done. Loaded symbols for /lib/libshouldbeinlibc.so.0.2 Reading symbols from /lib/libc.so.0.2...done. Loaded symbols for /lib/libc.so.0.2 Reading symbols from /lib/ld.so...done. Loaded symbols for /lib/ld.so Reading symbols from /lib/libmachuser.so.1...done. Loaded symbols for /lib/libmachuser.so.1 Reading symbols from /lib/libhurduser.so.0.0...done. Loaded symbols for /lib/libhurduser.so.0.0 [Switching to thread 86.1] (gdb) cont Continuing. warning: Can't wait for pid 86: No child processes Program received signal EXC_BAD_ACCESS, Could not access memory. [Switching to thread 86.12] 0x1000100 in ?? () (gdb) info thr 18 thread 86.18 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2 17 thread 86.17 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2 16 thread 86.16 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2 15 thread 86.15 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2 14 thread 86.14 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2 13 thread 86.13 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2 * 12 thread 86.12 0x1000100 in ?? () 11 thread 86.11 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2 10 thread 86.10 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2 9 thread 86.9 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2 8 thread 86.8 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2 7 thread 86.7 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2 6 thread 86.6 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2 5 thread 86.5 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2 4 thread 86.4 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2 3 thread 86.3 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2 2 thread 86.2 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2 1 thread 86.1 0x106109c in __mach_msg_trap () from /lib/libc.so.0.2 (gdb) bt full #0 0x1000100 in ?? () No symbol table info available. #1 0x1000100 in ?? () No symbol table info available. (gdb) x/5i $pc 0x1000100: add %al,(%eax) 0x1000102: add %al,(%eax) 0x1000104: add %al,(%eax) 0x1000106: add %al,(%eax) 0x1000108: add %al,(%eax) (gdb) i reg eax 0x0 0 ecx 0x1038730 17008432 edx 0xe 14 ebx 0x118d718 18405144 esp 0x128bee0 0x128bee0 ebp 0x128bf18 0x128bf18 esi 0x128df40 19455808 edi 0x803 2051 eip 0x1000100 0x1000100 eflags 0x10207 66055 cs 0x17 23 ss 0x1f 31 ds 0x1f 31 es 0x1f 31 fs 0x1f 31 gs 0x1f 31 fctrl 0x0 0 fstat 0x0 0 ftag 0x0 0 fiseg 0x0 0 fioff 0x0 0 foseg 0x0 0 fooff 0x0 0 fop 0x0 0 (gdb) print rolling_index $9 = 7 (gdb) print rolling_buffer $10 = {{msgh_bits = 2147488018, msgh_size = 120, msgh_remote_port = 163, msgh_local_port = 229, msgh_seqno = 3, msgh_id = 24022}, { msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163, msgh_local_port = 229, msgh_seqno = 4, msgh_id = 24021}, { msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163, msgh_local_port = 229, msgh_seqno = 5, msgh_id = 24031}, { msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163, msgh_local_port = 229, msgh_seqno = 6, msgh_id = 24031}, { msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163, msgh_local_port = 276, msgh_seqno = 0, msgh_id = 24013}, { msgh_bits = 4370, msgh_size = 24, msgh_remote_port = 163, msgh_local_port = 229, msgh_seqno = 7, msgh_id = 24018}, { msgh_bits = 4370, msgh_size = 40, msgh_remote_port = 163, msgh_local_port = 276, msgh_seqno = 1, msgh_id = 24017}, { msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163, msgh_local_port = 218, msgh_seqno = 1246, msgh_id = 24021}, { msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163, msgh_local_port = 218, msgh_seqno = 1247, msgh_id = 24031}, { msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163, msgh_local_port = 218, msgh_seqno = 1248, msgh_id = 24031}, { msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163, msgh_local_port = 229, msgh_seqno = 0, msgh_id = 24013}, { msgh_bits = 4370, msgh_size = 24, msgh_remote_port = 163, msgh_local_port = 218, msgh_seqno = 1249, msgh_id = 24018}, { ---Type <return> to continue, or q <return> to quit--- msgh_bits = 4370, msgh_size = 40, msgh_remote_port = 163, msgh_local_port = 229, msgh_seqno = 1, msgh_id = 24017}, { msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163, msgh_local_port = 218, msgh_seqno = 1250, msgh_id = 24030}, { msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163, msgh_local_port = 218, msgh_seqno = 1251, msgh_id = 24012}, { msgh_bits = 4370, msgh_size = 24, msgh_remote_port = 163, msgh_local_port = 229, msgh_seqno = 2, msgh_id = 24016}} (gdb) quit The program is running. Quit anyway (and detach it)? (y or n) y Detaching from program `/proc.exe' pid 86 hurd:~# exit Script done on Thu Mar 29 01:28:51 2001