On Wed, Mar 28, 2001 at 04:45:50PM -0500, Roland McGrath wrote:
> > I wouldn't know how to get it, so I don't know if I can. What do I need for
> > this?
> 
> Does ddb work these days?  Last time I did kernel hacking it was
> oskit-mach, and that dumps a stack trace when it panics.

I don't know, I never used ddb.
 
> > If it isn't "wire" I am looking for, I don't know what I am looking for (a
> > grep showed nothing in proc/).
> 
> You are right.  proc used to wire itself (wire_task_self), but it doesn't
> now (init does).  So this kernel bug is of more concern than I thought.

I should mention something. I attached two gdbs, and exited the first one
before the second. (I didn't clear the suspend count when starting the
first, and it didn't ask me for the suspend count when exiting, as it would
in another session I tried). So this might be related to gdb mayhem. I don't
know if running two gdbs is fine (it shouldn't crash the kernel, but...).

Anyway, I sticked with one gdb only this time and it didn't crash. The
subhurd reported that it can't emulate the crash and would reboot the Hurd
now, after exiting gdb. So the kernel panic thread_invoke is either a random
crash or a side effect of the two gdbs (would need to do more testing to
find out. Reproducing the crash takes about one hours, so I'd like to avoid
that).
 
> > Sometimes I wonder if the kernel ring buffer proposed by RMS wouldn't be
> > helpful in situations like this.
> 
> Well, maybe.  But it is a lot of overhead.  I'd be more inclined to work
> on a way to make it possible to trace a sub-hurd using rpctrace on
> the parent hurd.

Ok, sounds fine, too.

I have reproduced exactly the crash Jeff reported. I have collected the data.
I used a ring buffer of 16 entries (can increase if needed), and the full
gdb log is attached. Here are the three ports on which RPCs where logged
immediately before the crash (in interleaved order, see left column). If a
field is blank, it is the same as the previous one in the same column:

  port 218:

real-
order   bits            size    seqno   id
1.      2147488018      32      1246    24021 dostop
2.                              1247    24031 task2proc
3.                              1248    24031
5.                              1249    24018 get_arg_locations
7.                              1250    24030 task2pid
8.                              1251    24012 child

  port 229:

order   bits            size    seqno   id
4.      2147488018      32      0       24013 setmsgport
6.      4370            40      1       24017 set_arg_locations
9.                      24      2       24016 getpids
10.     2147488018      120     3       24022 handle_exceptions
11.                     32      4       24021 dostop
12.                             5       24031 task2proc
13.                             6       24031
15.     4370            24      7       24018 get_arg_locations

  port 279:

order   bits            size    seqno   id
14.     2147488018      32      0       24013 setmsgport
16.     4370            40      1       24017 set_arg_locations

 *** crash ***

Of course, one data point is not very much. I can run this a few more times,
and we can see if a pattern emerges. We can insert assertions etc.
We can probably log whole messages.
Can we run proc single threaded, so that we know where exactly it crashed?

Thanks,
Marcus


-- 
`Rhubarb is no Egyptian god.' Debian http://www.debian.org [EMAIL PROTECTED]
Marcus Brinkmann              GNU    http://www.gnu.org    [EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.marcus-brinkmann.de
Script started on Thu Mar 29 00:30:37 2001
hurd:~# gdb /proc.exe 86
GNU gdb 5.0
Copyright 2000 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-unknown-gnu0.2"...
/root/86: No such file or directory.
Attaching to program `/proc.exe', pid 86

warning: Can't modify tracing state for pid 86: No signal thread
Reading symbols from /lib/libhurdbugaddr.so.0.2...done.
Loaded symbols for /lib/libhurdbugaddr.so.0.2
Reading symbols from /lib/libthreads.so.0.2...done.
Loaded symbols for /lib/libthreads.so.0.2
Reading symbols from /lib/libihash.so.0.2...done.
Loaded symbols for /lib/libihash.so.0.2
Reading symbols from /lib/libports.so.0.2...done.
Loaded symbols for /lib/libports.so.0.2
Reading symbols from /lib/libshouldbeinlibc.so.0.2...done.
Loaded symbols for /lib/libshouldbeinlibc.so.0.2
Reading symbols from /lib/libc.so.0.2...done.
Loaded symbols for /lib/libc.so.0.2
Reading symbols from /lib/ld.so...done.
Loaded symbols for /lib/ld.so
Reading symbols from /lib/libmachuser.so.1...done.
Loaded symbols for /lib/libmachuser.so.1
Reading symbols from /lib/libhurduser.so.0.0...done.
Loaded symbols for /lib/libhurduser.so.0.0
[Switching to thread 86.1]
(gdb) cont
Continuing.
warning: Can't wait for pid 86: No child processes

Program received signal EXC_BAD_ACCESS, Could not access memory.
[Switching to thread 86.12]
0x1000100 in ?? ()
(gdb) info thr
  18 thread 86.18  0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
  17 thread 86.17  0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
  16 thread 86.16  0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
  15 thread 86.15  0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
  14 thread 86.14  0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
  13 thread 86.13  0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
* 12 thread 86.12  0x1000100 in ?? ()
  11 thread 86.11  0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
  10 thread 86.10  0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
  9 thread 86.9  0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
  8 thread 86.8  0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
  7 thread 86.7  0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
  6 thread 86.6  0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
  5 thread 86.5  0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
  4 thread 86.4  0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
  3 thread 86.3  0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
  2 thread 86.2  0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
  1 thread 86.1  0x106109c in __mach_msg_trap () from /lib/libc.so.0.2
(gdb) bt full
#0  0x1000100 in ?? ()
No symbol table info available.
#1  0x1000100 in ?? ()
No symbol table info available.
(gdb) x/5i $pc
0x1000100:      add    %al,(%eax)
0x1000102:      add    %al,(%eax)
0x1000104:      add    %al,(%eax)
0x1000106:      add    %al,(%eax)
0x1000108:      add    %al,(%eax)
(gdb) i reg
eax            0x0      0
ecx            0x1038730        17008432
edx            0xe      14
ebx            0x118d718        18405144
esp            0x128bee0        0x128bee0
ebp            0x128bf18        0x128bf18
esi            0x128df40        19455808
edi            0x803    2051
eip            0x1000100        0x1000100
eflags         0x10207  66055
cs             0x17     23
ss             0x1f     31
ds             0x1f     31
es             0x1f     31
fs             0x1f     31
gs             0x1f     31
fctrl          0x0      0
fstat          0x0      0
ftag           0x0      0
fiseg          0x0      0
fioff          0x0      0
foseg          0x0      0
fooff          0x0      0
fop            0x0      0
(gdb) print rolling_index
$9 = 7
(gdb) print rolling_buffer
$10 = {{msgh_bits = 2147488018, msgh_size = 120, msgh_remote_port = 163, 
    msgh_local_port = 229, msgh_seqno = 3, msgh_id = 24022}, {
    msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163, 
    msgh_local_port = 229, msgh_seqno = 4, msgh_id = 24021}, {
    msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163, 
    msgh_local_port = 229, msgh_seqno = 5, msgh_id = 24031}, {
    msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163, 
    msgh_local_port = 229, msgh_seqno = 6, msgh_id = 24031}, {
    msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163, 
    msgh_local_port = 276, msgh_seqno = 0, msgh_id = 24013}, {
    msgh_bits = 4370, msgh_size = 24, msgh_remote_port = 163, 
    msgh_local_port = 229, msgh_seqno = 7, msgh_id = 24018}, {
    msgh_bits = 4370, msgh_size = 40, msgh_remote_port = 163, 
    msgh_local_port = 276, msgh_seqno = 1, msgh_id = 24017}, {
    msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163, 
    msgh_local_port = 218, msgh_seqno = 1246, msgh_id = 24021}, {
    msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163, 
    msgh_local_port = 218, msgh_seqno = 1247, msgh_id = 24031}, {
    msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163, 
    msgh_local_port = 218, msgh_seqno = 1248, msgh_id = 24031}, {
    msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163, 
    msgh_local_port = 229, msgh_seqno = 0, msgh_id = 24013}, {
    msgh_bits = 4370, msgh_size = 24, msgh_remote_port = 163, 
    msgh_local_port = 218, msgh_seqno = 1249, msgh_id = 24018}, {
---Type <return> to continue, or q <return> to quit---
    msgh_bits = 4370, msgh_size = 40, msgh_remote_port = 163, 
    msgh_local_port = 229, msgh_seqno = 1, msgh_id = 24017}, {
    msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163, 
    msgh_local_port = 218, msgh_seqno = 1250, msgh_id = 24030}, {
    msgh_bits = 2147488018, msgh_size = 32, msgh_remote_port = 163, 
    msgh_local_port = 218, msgh_seqno = 1251, msgh_id = 24012}, {
    msgh_bits = 4370, msgh_size = 24, msgh_remote_port = 163, 
    msgh_local_port = 229, msgh_seqno = 2, msgh_id = 24016}}
(gdb) quit
The program is running.  Quit anyway (and detach it)? (y or n) y
Detaching from program `/proc.exe' pid 86
hurd:~# exit

Script done on Thu Mar 29 01:28:51 2001

Reply via email to