Hi,

This is not a "cooker" bug; but if NFS hasn't been fixed since LM7.2, then 8.0
may ship with a now-known bug...

I've been chasing a NFS bug in back-burner mode for some time.  The first one I
noticed is on a LM6.1 system:
[pfortin@micron pfortin]$ uptime
 12:19pm  up 206 days,  1:20,  2 users,  load average: 5.03, 4.96, 4.91
[pfortin@micron pfortin]$ netstat -l  (snipped)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
udp    64288      0 *:2049                  *:*
       ^^^^^
NFS can't be restarted, apparently because some data is stuck on the rx_queue...

The other day, one of my LM7.2 systems stopped handling NFS:
[root@woody /]# netstat -l
Proto Recv-Q Send-Q Local Address           Foreign Address         State
udp    16992      0 *:2049                  *:*
       ^^^^^
[root@woody /]# ps auxww | grep nfs
root       656  0.0  0.0     0    0 ?        DW   Mar21   0:01 [nfsd]
root       657  0.0  0.0     0    0 ?        DW   Mar21   0:01 [nfsd]
root       659  0.0  0.0     0    0 ?        DW   Mar21   0:01 [nfsd]
root       660  0.0  0.0     0    0 ?        DW   Mar21   0:01 [nfsd]
root       661  0.0  0.0     0    0 ?        DW   Mar21   0:01 [nfsd]
root       662  0.0  0.0     0    0 ?        DW   Mar21   0:01 [nfsd]
root       663  0.0  0.0     0    0 ?        DW   Mar21   0:01 [nfsd]

Note how pid 658 is missing...  while I no longer have logs to prove it, I
recall the LM6.1 system had a similar missing pid when the problem occured
there. 

/var/log/messages (server):
Apr 10 11:53:58 woody rpc.nfsd: nfssvc: Address already in
use                                                                   

/var/log/kernel/warnings (client):
Apr  8 04:10:52 bones kernel: nfs: server woody not responding, timed out
Apr  9 04:00:17 bones kernel: nfs: server woody not responding, timed out
Apr  9 04:12:33 bones kernel: nfs: server woody not responding, timed out
Apr 10 00:09:55 bones kernel: nfs: server woody not responding, timed out
Apr 10 00:20:01 bones kernel: nfs: server woody not responding, timed out
Apr 10 00:21:24 bones kernel: nfs: server woody not responding, timed out
Apr 10 00:23:41 bones kernel: nfs: server woody not responding, timed out
Apr 10 04:00:18 bones kernel: nfs: server woody not responding, timed out
Apr 10 04:11:36 bones kernel: nfs: server woody not responding, timed out
# trying to access NFS mounted files...
Apr 10 11:33:19 bones kernel: nfs: server woody not responding, timed out
Apr 10 11:36:00 bones last message repeated 4 times
Apr 10 11:36:00 bones kernel: nfs_read_super: get root fattr failed
Apr 10 11:40:24 bones kernel: nfs: server woody not responding, timed out
Apr 10 11:40:24 bones kernel: nfs_read_super: get root fattr failed
Apr 10 11:50:51 bones kernel: nfsd: terminating on signal 9
Apr 10 11:50:51 bones last message repeated 7 times
Apr 10 11:50:51 bones kernel: nfsd: last server exiting

/var/log/kernel/errors (server):
Apr  7 12:01:15 woody kernel: Unable to handle kernel NULL pointer dereference
at virtual address 00000000
Apr  7 12:01:15 woody kernel: current->tss.cr3 = 00101000, %%cr3 = 00101000

/var/log/kernel/warnings (server):
Apr  7 12:01:15 woody kernel: Oops: 0000
Apr  7 12:01:15 woody kernel: CPU:    0
Apr  7 12:01:15 woody kernel: EIP:    0010:[<00000000>]
Apr  7 12:01:15 woody kernel: EFLAGS: 00013283
Apr  7 12:01:15 woody kernel: eax: 00000000   ebx: c03907c4   ecx: 00000000  
edx: c3ee9000
Apr  7 12:01:15 woody kernel: esi: c02824d8   edi: c03907cc   ebp: c02824d8  
esp: c262de08
Apr  7 12:01:15 woody kernel: ds: 0018   es: 0018   ss: 0018
Apr  7 12:01:15 woody kernel: Process nfsd (pid: 658, process nr: 32,
stackpage=c262d000)
Apr  7 12:01:15 woody kernel: Stack: c03907c4 c03907cc c0134a3b c03907c4
6401a8c0 c48485b3 c32040c0 00000000
Apr  7 12:01:15 woody kernel:        c02824d8 00296ea6 c3ee9000 04000001
c2640120 c02824d8 c013510b c3ee9000
Apr  7 12:01:15 woody kernel:        00296ea6 c02824d8 00000013 c2640120
00000000 02000000 00296e5b c2948984
Apr  7 12:01:15 woody kernel: Call Trace: [get_new_inode+147/304]
[autofs:__insmod_autofs_O/lib/modules/2.2.17-21mdk/fs/autofs.o_M39D+-14925/96]
[iget_in_use+95/284] [handle_IRQ_event+61/120] [<c486a248>]
[do_8259A_IRQ+143/156] [<c486a519>]
Apr  7 12:01:15 woody kernel:        [<c486a50a>] [<c4868aea>] [<c4871cc0>]
[<c4871cc0>] [<c486845e>] [<c4871cc0>] [do_IRQ+42/72] [<c485590e>]
Apr  7 12:01:15 woody kernel:        [<c4871c4c>] [<c486d628>] [<c48682c3>]
[<c486825d>] [kernel_thread+31/56] [kernel_thread+40/56]
Apr  7 12:01:15 woody kernel: Code: Bad EIP value.
Apr  7 12:01:15 woody kernel: *pde = 00000000

When this nfs crash occurs, it is not possible AFAIK to restart nfs without
rebooting...  I'm not sure what causes these crashes, though I do recall we had
just used the LM6.1 system's CD to install LM7.2 on a Sony Vaio...  On the LM7.2
system (woody.pfortin.com), it was serving an old W98 disk which I was trying to
access with Wine.

None of the above systems have been restarted, so NFS is still dead; in case
anyone wants more info...

HTH,
Pierre

Reply via email to