Hi,
This is not a "cooker" bug; but if NFS hasn't been fixed since LM7.2, then 8.0
may ship with a now-known bug...
I've been chasing a NFS bug in back-burner mode for some time. The first one I
noticed is on a LM6.1 system:
[pfortin@micron pfortin]$ uptime
12:19pm up 206 days, 1:20, 2 users, load average: 5.03, 4.96, 4.91
[pfortin@micron pfortin]$ netstat -l (snipped)
Proto Recv-Q Send-Q Local Address Foreign Address State
udp 64288 0 *:2049 *:*
^^^^^
NFS can't be restarted, apparently because some data is stuck on the rx_queue...
The other day, one of my LM7.2 systems stopped handling NFS:
[root@woody /]# netstat -l
Proto Recv-Q Send-Q Local Address Foreign Address State
udp 16992 0 *:2049 *:*
^^^^^
[root@woody /]# ps auxww | grep nfs
root 656 0.0 0.0 0 0 ? DW Mar21 0:01 [nfsd]
root 657 0.0 0.0 0 0 ? DW Mar21 0:01 [nfsd]
root 659 0.0 0.0 0 0 ? DW Mar21 0:01 [nfsd]
root 660 0.0 0.0 0 0 ? DW Mar21 0:01 [nfsd]
root 661 0.0 0.0 0 0 ? DW Mar21 0:01 [nfsd]
root 662 0.0 0.0 0 0 ? DW Mar21 0:01 [nfsd]
root 663 0.0 0.0 0 0 ? DW Mar21 0:01 [nfsd]
Note how pid 658 is missing... while I no longer have logs to prove it, I
recall the LM6.1 system had a similar missing pid when the problem occured
there.
/var/log/messages (server):
Apr 10 11:53:58 woody rpc.nfsd: nfssvc: Address already in
use
/var/log/kernel/warnings (client):
Apr 8 04:10:52 bones kernel: nfs: server woody not responding, timed out
Apr 9 04:00:17 bones kernel: nfs: server woody not responding, timed out
Apr 9 04:12:33 bones kernel: nfs: server woody not responding, timed out
Apr 10 00:09:55 bones kernel: nfs: server woody not responding, timed out
Apr 10 00:20:01 bones kernel: nfs: server woody not responding, timed out
Apr 10 00:21:24 bones kernel: nfs: server woody not responding, timed out
Apr 10 00:23:41 bones kernel: nfs: server woody not responding, timed out
Apr 10 04:00:18 bones kernel: nfs: server woody not responding, timed out
Apr 10 04:11:36 bones kernel: nfs: server woody not responding, timed out
# trying to access NFS mounted files...
Apr 10 11:33:19 bones kernel: nfs: server woody not responding, timed out
Apr 10 11:36:00 bones last message repeated 4 times
Apr 10 11:36:00 bones kernel: nfs_read_super: get root fattr failed
Apr 10 11:40:24 bones kernel: nfs: server woody not responding, timed out
Apr 10 11:40:24 bones kernel: nfs_read_super: get root fattr failed
Apr 10 11:50:51 bones kernel: nfsd: terminating on signal 9
Apr 10 11:50:51 bones last message repeated 7 times
Apr 10 11:50:51 bones kernel: nfsd: last server exiting
/var/log/kernel/errors (server):
Apr 7 12:01:15 woody kernel: Unable to handle kernel NULL pointer dereference
at virtual address 00000000
Apr 7 12:01:15 woody kernel: current->tss.cr3 = 00101000, %%cr3 = 00101000
/var/log/kernel/warnings (server):
Apr 7 12:01:15 woody kernel: Oops: 0000
Apr 7 12:01:15 woody kernel: CPU: 0
Apr 7 12:01:15 woody kernel: EIP: 0010:[<00000000>]
Apr 7 12:01:15 woody kernel: EFLAGS: 00013283
Apr 7 12:01:15 woody kernel: eax: 00000000 ebx: c03907c4 ecx: 00000000
edx: c3ee9000
Apr 7 12:01:15 woody kernel: esi: c02824d8 edi: c03907cc ebp: c02824d8
esp: c262de08
Apr 7 12:01:15 woody kernel: ds: 0018 es: 0018 ss: 0018
Apr 7 12:01:15 woody kernel: Process nfsd (pid: 658, process nr: 32,
stackpage=c262d000)
Apr 7 12:01:15 woody kernel: Stack: c03907c4 c03907cc c0134a3b c03907c4
6401a8c0 c48485b3 c32040c0 00000000
Apr 7 12:01:15 woody kernel: c02824d8 00296ea6 c3ee9000 04000001
c2640120 c02824d8 c013510b c3ee9000
Apr 7 12:01:15 woody kernel: 00296ea6 c02824d8 00000013 c2640120
00000000 02000000 00296e5b c2948984
Apr 7 12:01:15 woody kernel: Call Trace: [get_new_inode+147/304]
[autofs:__insmod_autofs_O/lib/modules/2.2.17-21mdk/fs/autofs.o_M39D+-14925/96]
[iget_in_use+95/284] [handle_IRQ_event+61/120] [<c486a248>]
[do_8259A_IRQ+143/156] [<c486a519>]
Apr 7 12:01:15 woody kernel: [<c486a50a>] [<c4868aea>] [<c4871cc0>]
[<c4871cc0>] [<c486845e>] [<c4871cc0>] [do_IRQ+42/72] [<c485590e>]
Apr 7 12:01:15 woody kernel: [<c4871c4c>] [<c486d628>] [<c48682c3>]
[<c486825d>] [kernel_thread+31/56] [kernel_thread+40/56]
Apr 7 12:01:15 woody kernel: Code: Bad EIP value.
Apr 7 12:01:15 woody kernel: *pde = 00000000
When this nfs crash occurs, it is not possible AFAIK to restart nfs without
rebooting... I'm not sure what causes these crashes, though I do recall we had
just used the LM6.1 system's CD to install LM7.2 on a Sony Vaio... On the LM7.2
system (woody.pfortin.com), it was serving an old W98 disk which I was trying to
access with Wine.
None of the above systems have been restarted, so NFS is still dead; in case
anyone wants more info...
HTH,
Pierre