[Kernel-packages] [Bug 974664] Re: Kernel Oops - BUG: unable to handle kernel paging request at ffffffffffffffb8; RIP: 0010:[] [] nfs_have_delegation+0x9/0x40 [nfs]

Bug Watch Updater Fri, 27 Oct 2017 22:43:13 -0700

Launchpad has imported 33 comments from the remote bug at
https://bugzilla.redhat.com/show_bug.cgi?id=811138.


If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.

------------------------------------------------------------------------
On 2012-04-10T08:59:31+00:00 Bert wrote:

Description of problem:

Kernel traceback on NFS4 locking attempt

Version-Release number of selected component (if applicable):

kernel 3.3.1-2 (and 3.3.1-3)

How reproducible:

Always

Steps to Reproduce:
1. Install kernel 3.3.1-{2,3}; mount a users home over NFS4
2. Start e.g. pidgin, which tries to lock some file
3.
  
Actual results:

*) Pidgin crashes with 'Killed'.  Strace shows that the last activity
   is:

[...]
open("/users/visics/deknuydt/.config/enchant/en_US.dic", O_RDONLY) = 11
flock(11, LOCK_EX <unfinished ...>
+++ killed by SIGKILL +++
Killed

*) In dmesg, you see:

 [  322.987247] BUG: unable to handle kernel paging request at ffffffffffffffb8
[  322.987330] IP: [<ffffffffa0e140e9>] nfs_have_delegation+0x9/0x40 [nfs]
[  322.987418] PGD 1c07067 PUD 1c08067 PMD 0 
[  322.987495] Oops: 0000 [#1] SMP 
[  322.987565] CPU 1 
[  322.987574] Modules linked in: nfs fscache nf_conntrack_ipv4 nf_defrag_ipv4 
xt_state nf_conntrack sha256_generic dm_crypt nvidia(PO) nfsd uinput lockd 
nfs_acl auth_rpcgss sunrpc snd_hda_codec_realtek snd_hda_intel snd_hda_codec 
snd_hwdep snd_seq snd_seq_device snd_pcm iTCO_wdt i2c_i801 iTCO_vendor_support 
intel_rng microcode r8169 i2c_core mii snd_timer snd soundcore snd_page_alloc 
serio_raw firewire_ohci firewire_core crc_itu_t sata_sil24 video [last 
unloaded: scsi_wait_scan]
[  322.988230] 
[  322.988230] Pid: 1697, comm: pidgin Tainted: P           O 
3.3.1-3.fc16.x86_64 #1 transtec AG         /DG31PR
[  322.988230] RIP: 0010:[<ffffffffa0e140e9>]  [<ffffffffa0e140e9>] 
nfs_have_delegation+0x9/0x40 [nfs]
[  322.988230] RSP: 0018:ffff880112e55dd8  EFLAGS: 00010246
[  322.988230] RAX: ffff880122411800 RBX: ffff880112e55e68 RCX: 00000000ffffd8ca
[  322.988230] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
[  322.988230] RBP: ffff880112e55dd8 R08: 0000000000016560 R09: ffffea00044b8400
[  322.988230] R10: ffffffffa0e0457a R11: 0000000000000000 R12: 00000000ffffd8ca
[  322.988230] R13: ffff8801218be000 R14: ffff88011573cc00 R15: ffff8801224b0480
[  322.988230] FS:  00007fdda2060980(0000) GS:ffff88012fc80000(0000) 
knlGS:0000000000000000
[  322.988230] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  322.988230] CR2: ffffffffffffffb8 CR3: 0000000112c07000 CR4: 00000000000006e0
[  322.988230] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  322.988230] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  322.988230] Process pidgin (pid: 1697, threadinfo ffff880112e54000, task 
ffff880112dd2e60)
[  322.988230] Stack:
[  322.988230]  ffff880112e55e28 ffffffffa0e01fa1 ffff880112e11e60 
0000000000000000
[  322.988230]  ffff8801224b0480 ffff8801224b0780 ffff8801224b0480 
0000000000000082
[  322.988230]  ffff880112e55e68 00000000fffffff5 ffff880112e55eb8 
ffffffffa0e04b9c
[  322.988230] Call Trace:
[  322.988230]  [<ffffffffa0e01fa1>] nfs4_handle_exception+0x241/0x3a0 [nfs]
[  322.988230]  [<ffffffffa0e04b9c>] nfs4_proc_lock+0xec/0x440 [nfs]
[  322.988230]  [<ffffffffa0de518d>] do_setlk+0xed/0x110 [nfs]
[  322.988230]  [<ffffffffa0de5239>] nfs_flock+0x89/0xe0 [nfs]
[  322.988230]  [<ffffffff811cbd53>] sys_flock+0x113/0x1c0
[  322.988230]  [<ffffffff815fbfe9>] system_call_fastpath+0x16/0x1b
[  322.988230] Code: fd ff e9 40 fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 55 
48 89 e5 66 66 66 66 90 f0 80 4f 48 04 5d c3 55 48 89 e5 66 66 66 66 90 <48> 8b 
57 b8 31 c0 48 85 d2 74 0c 8b 4a 30 83 e6 03 21 f1 39 f1 
[  322.988230] RIP  [<ffffffffa0e140e9>] nfs_have_delegation+0x9/0x40 [nfs]
[  322.988230]  RSP <ffff880112e55dd8>
[  322.988230] CR2: ffffffffffffffb8
[  322.988230] ---[ end trace 158525064a4030cd ]---


Expected results:

*) No NFS4 delegation problems.

Additional info:

1) Koji kernel 3.3.1-3 has exactly the same problem.

2) Kernel 3.3.0-8 does not have the problem.

3) Sorry for the tainted kernel (nvidia).  I'll confirm with an untainted
   asap.

4) It seems that Ubuntu has this problem too.
   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/3

------------------------------------------------------------------------
On 2012-04-10T12:37:33+00:00 Tim wrote:

We are suffering the same problem, it occurs right after login when
gnome-keyring tries to access the home directory mounted via NFSv4 with
kernel 3.3.1-3 on the client. Server is CentOS 6.2. Client-side kernel
3.2.10-3.fc16.x86_64 works for us. Kernel is not tainted.

Apr 10 14:26:32 morgan kernel: [   54.179536] BUG: unable to handle kernel 
paging request at ffffffffffffffb8
Apr 10 14:26:32 morgan kernel: [   54.179575] IP: [<ffffffffa03800e9>] 
nfs_have_delegation+0x9/0x40 [nfs]
Apr 10 14:26:32 morgan kernel: [   54.179621] PGD 1c07067 PUD 1c08067 PMD 0 
Apr 10 14:26:32 morgan kernel: [   54.179646] Oops: 0000 [#1] SMP 
Apr 10 14:26:32 morgan kernel: [   54.179665] CPU 1 
Apr 10 14:26:32 morgan kernel: [   54.179675] Modules linked in: fuse nfs 
fscache auth_rpcgss nfs_acl lockd nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT 
nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter 
ip6_tables s
nd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_seq 
snd_seq_device ppdev snd_pcm joydev i2c_i801 snd_timer snd soundcore 
snd_page_alloc serio_raw dcdbas parport_pc e1000e iTCO_wdt iTCO_vendor_support 
parport microcode sunr
pc uinput usb_storage ata_generic pata_acpi radeon ttm drm_kms_helper drm 
i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
Apr 10 14:26:32 morgan kernel: [   54.179957] 
Apr 10 14:26:32 morgan kernel: [   54.179966] Pid: 1345, comm: gnome-keyring-d 
Not tainted 3.3.1-3.fc16.x86_64 #1 Dell Inc. OptiPlex 755                 
/0GM819
Apr 10 14:26:32 morgan kernel: [   54.180015] RIP: 0010:[<ffffffffa03800e9>]  
[<ffffffffa03800e9>] nfs_have_delegation+0x9/0x40 [nfs]
Apr 10 14:26:32 morgan kernel: [   54.180061] RSP: 0018:ffff880074255dd8  
EFLAGS: 00010246
Apr 10 14:26:32 morgan kernel: [   54.180084] RAX: ffff8800608f4400 RBX: 
ffff880074255e68 RCX: 00000000ffffd8ca
Apr 10 14:26:32 morgan kernel: [   54.180112] RDX: 0000000000000000 RSI: 
0000000000000001 RDI: 0000000000000000
Apr 10 14:26:32 morgan kernel: [   54.180139] RBP: ffff880074255dd8 R08: 
0000000000016560 R09: ffffea0001dc9a00
Apr 10 14:26:32 morgan kernel: [   54.180167] R10: ffffffffa037057a R11: 
0000000000000000 R12: 00000000ffffd8ca
Apr 10 14:26:32 morgan kernel: [   54.180195] R13: ffff8800608b6000 R14: 
ffff88006041d000 R15: ffff88007793e600
Apr 10 14:26:32 morgan kernel: [   54.180223] FS:  00007ffb5b180800(0000) 
GS:ffff88007da40000(0000) knlGS:0000000000000000
Apr 10 14:26:32 morgan kernel: [   54.180255] CS:  0010 DS: 0000 ES: 0000 CR0: 
000000008005003b
Apr 10 14:26:32 morgan kernel: [   54.180277] CR2: ffffffffffffffb8 CR3: 
00000000779d9000 CR4: 00000000000006e0
Apr 10 14:26:32 morgan kernel: [   54.180305] DR0: 0000000000000000 DR1: 
0000000000000000 DR2: 0000000000000000
Apr 10 14:26:32 morgan kernel: [   54.180333] DR3: 0000000000000000 DR6: 
00000000ffff0ff0 DR7: 0000000000000400
Apr 10 14:26:32 morgan kernel: [   54.180362] Process gnome-keyring-d (pid: 
1345, threadinfo ffff880074254000, task ffff880078782e60)
Apr 10 14:26:32 morgan kernel: [   54.180396] Stack:
Apr 10 14:26:32 morgan kernel: [   54.180405]  ffff880074255e28 
ffffffffa036dfa1 ffff880077268660 0000000000000000
Apr 10 14:26:32 morgan kernel: [   54.180443]  ffff88007793e600 
ffff88007793e780 ffff88007793e600 0000000000000002
Apr 10 14:26:32 morgan kernel: [   54.180479]  ffff880074255e68 
00000000fffffff5 ffff880074255eb8 ffffffffa0370b9c
Apr 10 14:26:32 morgan kernel: [   54.180516] Call Trace:
Apr 10 14:26:32 morgan kernel: [   54.180521]  [<ffffffffa036dfa1>] 
nfs4_handle_exception+0x241/0x3a0 [nfs]
Apr 10 14:26:32 morgan kernel: [   54.180521]  [<ffffffffa0370b9c>] 
nfs4_proc_lock+0xec/0x440 [nfs]
Apr 10 14:26:32 morgan kernel: [   54.180521]  [<ffffffffa035118d>] 
do_setlk+0xed/0x110 [nfs]
Apr 10 14:26:32 morgan kernel: [   54.180521]  [<ffffffffa0351239>] 
nfs_flock+0x89/0xe0 [nfs]
Apr 10 14:26:32 morgan kernel: [   54.180521]  [<ffffffff811cbd53>] 
sys_flock+0x113/0x1c0
Apr 10 14:26:32 morgan kernel: [   54.180521]  [<ffffffff815fbfe9>] 
system_call_fastpath+0x16/0x1b
Apr 10 14:26:32 morgan kernel: [   54.180521] Code: fd ff e9 40 fe ff ff 66 66 
2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 f0 80 4f 48 04 5d c3 55 
48 89 e5 66 66 66 66 90 <48> 8b 57 b8 31 c0 48 85 d2 74 0c 8b 4a 30 83 e6 03 21 
f1 39 f1 
Apr 10 14:26:32 morgan kernel: [   54.180521] RIP  [<ffffffffa03800e9>] 
nfs_have_delegation+0x9/0x40 [nfs]
Apr 10 14:26:32 morgan kernel: [   54.180521]  RSP <ffff880074255dd8>
Apr 10 14:26:32 morgan kernel: [   54.180521] CR2: ffffffffffffffb8
Apr 10 14:26:32 morgan kernel: [   54.180521] ---[ end trace 9fc8f48b5dd9635e 
]---

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/4

------------------------------------------------------------------------
On 2012-04-11T10:19:55+00:00 Josh wrote:

*** Bug 811484 has been marked as a duplicate of this bug. ***

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/5

------------------------------------------------------------------------
On 2012-04-13T17:57:03+00:00 Tim wrote:

Any news on this? It's kinda urgent, had to make all of our desktop
machines boot an old kernel now...

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/7

------------------------------------------------------------------------
On 2012-04-14T21:54:32+00:00 Roland wrote:

On our desktop clients kernel 3.3.1-5.fc16.x86_64 shows the same
nfs_have_delegation error when using an NFS mounted home directory. This
system was just installed today with fedora-updates repo enabled. To
reproduce edit any existing file in the home directory with gedit and
try to save it. I can confirm that kernel 3.3.0-8.fc16.x86_64 does not
have this problem.

Apr 14 22:40:11 client kernel: [  913.752179] BUG: unable to handle kernel 
paging request at ffffffffffffffb8
Apr 14 22:40:11 client kernel: [  913.752204] IP: [<ffffffffa039c0e9>] 
nfs_have_delegation+0x9/0x40 [nfs]
Apr 14 22:40:11 client kernel: [  913.752234] PGD 1c07067 PUD 1c08067 PMD 0 
Apr 14 22:40:11 client kernel: [  913.752245] Oops: 0000 [#1] SMP 
Apr 14 22:40:11 client kernel: [  913.752256] CPU 1 
Apr 14 22:40:11 client kernel: [  913.752260] Modules linked in: nfs fscache 
auth_rpcgss nfs_acl lockd ip6t_REJECT nf_conntrack_ipv6 nf_defr
ag_ipv6 ip6table_filter ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state 
nf_conntrack tpm_infineon snd_hda_codec_analog hp_wmi sparse_ke
ymap rfkill ppdev snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device 
snd_pcm microcode snd_timer snd soundcore snd_page_alloc seri
o_raw iTCO_wdt iTCO_vendor_support parport_pc parport e1000e tpm_tis tpm 
tpm_bios sunrpc uinput ata_generic pata_acpi nouveau ttm drm_kms_he
lper drm i2c_core mxm_wmi video wmi [last unloaded: scsi_wait_scan]
Apr 14 22:40:11 client kernel: [  913.752401] 
Apr 14 22:40:11 client kernel: [  913.752405] Pid: 2786, comm: gedit Tainted: G 
         I  3.3.1-5.fc16.x86_64 #1 ***REMOVED***
Apr 14 22:40:11 client kernel: [  913.752424] RIP: 0010:[<ffffffffa039c0e9>]  
[<ffffffffa039c0e9>] nfs_have_delegation+0x9/0x40 [nfs]
Apr 14 22:40:11 client kernel: [  913.752447] RSP: 0018:ffff88005a6a3dd8  
EFLAGS: 00010246
Apr 14 22:40:11 client kernel: [  913.752453] RAX: ffff88006428b400 RBX: 
ffff88005a6a3e68 RCX: 00000000ffffd8ca
Apr 14 22:40:11 client kernel: [  913.752461] RDX: 0000000000000000 RSI: 
0000000000000001 RDI: 0000000000000000
Apr 14 22:40:11 client kernel: [  913.752476] RBP: ffff88005a6a3dd8 R08: 
0000000000016560 R09: ffffea0001694980
Apr 14 22:40:11 client kernel: [  913.752483] R10: ffffffffa038c57a R11: 
0000000000000000 R12: 00000000ffffd8ca
Apr 14 22:40:11 client kernel: [  913.752656] R13: ffff8800659ea800 R14: 
ffff880064289400 R15: ffff8800523599c0
Apr 14 22:40:11 client kernel: [  913.752664] FS:  00007f5846754980(0000) 
GS:ffff88007da80000(0000) knlGS:0000000000000000
Apr 14 22:40:11 client kernel: [  913.752674] CS:  0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
Apr 14 22:40:11 client kernel: [  913.752680] CR2: ffffffffffffffb8 CR3: 
00000000642bb000 CR4: 00000000000006e0
Apr 14 22:40:11 client kernel: [  913.752691] DR0: 0000000000000000 DR1: 
0000000000000000 DR2: 0000000000000000
Apr 14 22:40:11 client kernel: [  913.752699] DR3: 0000000000000000 DR6: 
00000000ffff0ff0 DR7: 0000000000000400
Apr 14 22:40:11 client kernel: [  913.752707] Process gedit (pid: 2786, 
threadinfo ffff88005a6a2000, task ffff8800646fdcc0)
Apr 14 22:40:11 client kernel: [  913.752725] Stack:
Apr 14 22:40:11 client kernel: [  913.752729]  ffff88005a6a3e28 
ffffffffa0389fa1 ffff88005a527660 0000000000000000
Apr 14 22:40:11 client kernel: [  913.752766]  ffff8800523599c0 
ffff880052359000 ffff8800523599c0 0000000000000082
Apr 14 22:40:11 client kernel: [  913.752786]  ffff88005a6a3e68 
00000000fffffff5 ffff88005a6a3eb8 ffffffffa038cb9c
Apr 14 22:40:11 client kernel: [  913.752810] Call Trace:
Apr 14 22:40:11 client kernel: [  913.752827]  [<ffffffffa0389fa1>] 
nfs4_handle_exception+0x241/0x3a0 [nfs]
Apr 14 22:40:11 client kernel: [  913.752846]  [<ffffffffa038cb9c>] 
nfs4_proc_lock+0xec/0x440 [nfs]
Apr 14 22:40:11 client kernel: [  913.752860]  [<ffffffffa036d18d>] 
do_setlk+0xed/0x110 [nfs]
Apr 14 22:40:11 client kernel: [  913.752872]  [<ffffffffa036d239>] 
nfs_flock+0x89/0xe0 [nfs]
Apr 14 22:40:11 client kernel: [  913.752882]  [<ffffffff811cbd23>] 
sys_flock+0x113/0x1c0
Apr 14 22:40:11 client kernel: [  913.752895]  [<ffffffff815fc069>] 
system_call_fastpath+0x16/0x1b
Apr 14 22:40:11 client kernel: [  913.752902] Code: fd ff e9 40 fe ff ff 66 66 
2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 f0 80 4
f 48 04 5d c3 55 48 89 e5 66 66 66 66 90 <48> 8b 57 b8 31 c0 48 85 d2 74 0c 8b 
4a 30 83 e6 03 21 f1 39 f1 
Apr 14 22:40:11 client kernel: [  913.753124] RIP  [<ffffffffa039c0e9>] 
nfs_have_delegation+0x9/0x40 [nfs]
Apr 14 22:40:11 client kernel: [  913.753145]  RSP <ffff88005a6a3dd8>
Apr 14 22:40:11 client kernel: [  913.753150] CR2: ffffffffffffffb8
Apr 14 22:40:11 client kernel: [  913.753161] ---[ end trace a7919e7f17c0a727 
]---

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/8

------------------------------------------------------------------------
On 2012-04-16T11:08:16+00:00 Bert wrote:

Some more tests:

1) kernels 3.3.2-1.fc1{6,7}.x86_64 from Koji have exactly the same
problem.

2) With 3.4.0-0.rc2.git3.1.fc18.x86_64 I get no crash of e.g. pidgin,
   but a hang, and this stuff in the dmesg

[  216.350962] nfs4_reclaim_open_state: 3257 callbacks suppressed
[  216.350965] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
[  216.352550] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
[  216.354033] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
[  216.355580] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
[  216.357055] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
[  216.358522] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
[  216.360042] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
[  216.361513] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
[  216.363006] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
[  216.364582] NFS: nfs4_reclaim_open_state: Lock reclaim failed!

This repeats endlessly till the offending program is killed.

Abt. 120 machines affected here.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/9

------------------------------------------------------------------------
On 2012-04-16T13:35:23+00:00 Josh wrote:

*** Bug 812633 has been marked as a duplicate of this bug. ***

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/10

------------------------------------------------------------------------
On 2012-04-16T18:02:25+00:00 Jeff wrote:

Looks like this is probably fixable by pushing commit 14977489 to
stable.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/14

------------------------------------------------------------------------
On 2012-04-16T19:25:06+00:00 Jeff wrote:

cc'ing Trond...

It looks like this regression crept in with commit 3114ea7a. That commit was
sent to stable kernel series, but commit 149774 (which fixed a bug in the
above commit) was not.

Trond, any chance you can send 149774 to stable as well? We might also want
to take that into Fedora for now until it makes its way into stable kernels.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/16

------------------------------------------------------------------------
On 2012-04-17T16:47:10+00:00 Josh wrote:

I've added 149774 to F15-F17 now.  Rawhide already has it.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/19

------------------------------------------------------------------------
On 2012-04-17T17:00:44+00:00 Josh wrote:

I've started a scratch build here:

http://koji.fedoraproject.org/koji/taskinfo?taskID=3999546

testing would be appreciated.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/20

------------------------------------------------------------------------
On 2012-04-17T22:33:28+00:00 Tim wrote:

I'm also seeing this (didn't report; tainted kernel, though I now have
another box set up with an untainted kernel to play with)

I grabbed the SRPM and built myself a kernel-3.3.2-3.fc17.x86_64.rpm

Alas, I can't try my 100% reliable reproducer of "hit the New Message
button in kmail" because this kernel gives me my NFS home directory
owned by an unfeasibly large UID :-)

Under 3.3.0-8.fc17.x86_64 the files are owned by 'tim' but under
kernel-3.3.2-3.fc17.x86_64 it's someone called 4294967294 with GID
4294967294. Which would be a 32-bit -2 which since uid_t is unsigned
32-bit, could be a signed return code cast where it oughtn't.

I know about getting /etc/idmapd.conf correct. I can flip back and forth
between the states by rebooting and selecting the different kernels with
no other change.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/24

------------------------------------------------------------------------
On 2012-04-17T23:47:47+00:00 Josh wrote:

(In reply to comment #11)
> I'm also seeing this (didn't report; tainted kernel, though I now have another
> box set up with an untainted kernel to play with)
> 
> I grabbed the SRPM and built myself a kernel-3.3.2-3.fc17.x86_64.rpm
> 
> Alas, I can't try my 100% reliable reproducer of "hit the New Message button 
> in
> kmail" because this kernel gives me my NFS home directory owned by an
> unfeasibly large UID :-)

F17 has additional NFS patches that are needed to work with the F17
userspace utilities.  You would need to have an F17 SRPM for it to work
properly.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/25

------------------------------------------------------------------------
On 2012-04-18T07:24:59+00:00 Bert wrote:


To Josh Boyer in reply to comment #10:

I tried your 3.3.2-3.fc16.x86_64 build from Koji.  It behaves like the 3.4.0
kernel from F18: the program (pidgin in this case) hangs, and dmesg spits, by 
the thousands:

[   96.542908] nfs4_reclaim_open_state: Lock reclaim failed!
[   96.543687] nfs4_reclaim_open_state: Lock reclaim failed!
[   96.544439] nfs4_reclaim_open_state: Lock reclaim failed!
[   96.545258] nfs4_reclaim_open_state: Lock reclaim failed!
[   96.546066] nfs4_reclaim_open_state: Lock reclaim failed!

So no solution yet ...

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/26

------------------------------------------------------------------------
On 2012-04-18T09:23:57+00:00 Tim wrote:

(In reply to comment #12)
> (In reply to comment #11)
> > I'm also seeing this (didn't report; tainted kernel, though I now have 
> > another
> > box set up with an untainted kernel to play with)
> > 
> > I grabbed the SRPM and built myself a kernel-3.3.2-3.fc17.x86_64.rpm
> > 
> > Alas, I can't try my 100% reliable reproducer of "hit the New Message 
> > button in
> > kmail" because this kernel gives me my NFS home directory owned by an
> > unfeasibly large UID :-)
> 
> F17 has additional NFS patches that are needed to work with the F17 userspace
> utilities.  You would need to have an F17 SRPM for it to work properly.

Any instructions for finding/making one? I furtled through
koji.fedoraproject.org and couldn't find one. Would it be useful to try
a 3.4.0 as comment #13 has? (I'm guessing here that f18 has the NFS
patches of which you speak) Or should I drop that spare machine back to
f16 to try this out?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/27

------------------------------------------------------------------------
On 2012-04-18T12:22:36+00:00 Josh wrote:

*** Bug 813732 has been marked as a duplicate of this bug. ***

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/32

------------------------------------------------------------------------
On 2012-04-18T12:24:20+00:00 Josh wrote:

(In reply to comment #14)
> > F17 has additional NFS patches that are needed to work with the F17 
> > userspace
> > utilities.  You would need to have an F17 SRPM for it to work properly.
> 
> Any instructions for finding/making one? I furtled through
> koji.fedoraproject.org and couldn't find one. Would it be useful to try a 
> 3.4.0
> as comment #13 has? (I'm guessing here that f18 has the NFS patches of which
> you speak) Or should I drop that spare machine back to f16 to try this out?

I started an F17 scratch build here:

http://koji.fedoraproject.org/koji/taskinfo?taskID=4001445

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/33

------------------------------------------------------------------------
On 2012-04-18T12:47:54+00:00 Josh wrote:

(In reply to comment #13)
> To Josh Boyer in reply to comment #10:
> 
> I tried your 3.3.2-3.fc16.x86_64 build from Koji.  It behaves like the 3.4.0
> kernel from F18: the program (pidgin in this case) hangs, and dmesg spits, by
> the thousands:
> 
> [   96.542908] nfs4_reclaim_open_state: Lock reclaim failed!
> [   96.543687] nfs4_reclaim_open_state: Lock reclaim failed!
> [   96.544439] nfs4_reclaim_open_state: Lock reclaim failed!
> [   96.545258] nfs4_reclaim_open_state: Lock reclaim failed!
> [   96.546066] nfs4_reclaim_open_state: Lock reclaim failed!
> 
> So no solution yet ...

Well, it's not oopsing at least.

Jeff?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/34

------------------------------------------------------------------------
On 2012-04-18T13:55:59+00:00 Tim wrote:

OK. Same "nfs4_reclaim_open_state: Lock reclaim failed!" here.

Even more trivial way to reproduce, BTW:

bash$ /usr/bin/flock /file/on/nfs echo Fish

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/36

------------------------------------------------------------------------
On 2012-04-18T20:32:27+00:00 Roland wrote:

With kernel 3.3.2-3.fc16.x86_64 and saving a file on NFS mounted
partition using gedit I have a similar result as described in comment
#13: no longer a kernel oops and dmesg showing many lines of "Lock
reclaim failed!":

[  302.568318] nfs4_reclaim_open_state: Lock reclaim failed!
[  302.569517] nfs4_reclaim_open_state: Lock reclaim failed!
[  302.570724] nfs4_reclaim_open_state: Lock reclaim failed!
[  302.571894] nfs4_reclaim_open_state: Lock reclaim failed!
[  302.573091] nfs4_reclaim_open_state: Lock reclaim failed!
[  302.574305] nfs4_reclaim_open_state: Lock reclaim failed!
[  302.575437] nfs4_reclaim_open_state: Lock reclaim failed!

I cannot reproduce the "Lock reclaim failed!" error with the flock
command in comment #18 though. Next I tried running

    /usr/bin/flock ~/a sleep 10

in one terminal and

    /usr/bin/flock ~/a echo foo

in another terminal and this kind of locking seems to work fine since
the second flock waits with executing the echo command until the sleep
command has exited.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/37

------------------------------------------------------------------------
On 2012-04-19T10:08:11+00:00 Tim wrote:

Hmm. I have more interesting information....

On 3.3.0-8.fc17.x86_64, where it works, /usr/bin/flock does this (which
is odd, I suppose, but still...):

open("/home/tim/tmp/spon", O_RDONLY|O_CREAT|O_NOCTTY, 0666) = 3
flock(3, LOCK_EX)                 = -1 EIO (Input/output error)
access("/home/tim/tmp/spon", R_OK|W_OK) = 0
close(3)                          = 0
open("/home/tim/tmp/spon", O_RDWR|O_CREAT|O_NOCTTY, 0666) = 3
flock(3, LOCK_EX)                 = 0

So it opens the file read-only, then tries to LOCK_EX and the NFS server
(entirely reasonably) says "NFS4ERR_OPENMODE, you silly person". Then
and only then does /usr/bin/flock get around to checking the
permissions, opening it read/write, and doing another LOCK_EX, which
works.

On my 3.3.2-4.fc17.x86_64, however, /usr/bin/flock only gets as far as

open("/home/tim/tmp/spon", O_RDONLY|O_CREAT|O_NOCTTY, 0666) = 3
flock(3, LOCK_EX                 ....never returns

while a sniffer trace shows a continuous stream of
OPEN(readonly)/LOCK(write) operations which get a continous stream of
replies of the form "yeah, all right" and "what are you smoking?"
respectively.

So I would venture to suggest that the problem lurks around the
-NFS4ERR_OPENMODE case in nfs4_handle_exception() in fs/nfs/nfs4proc.c.
I've got printk()s which clearly show it's going through that path,
calling nfs4_schedule_stateid_recovery() and dropping down to
wait_on_recovery:

Someone more familiar with the code will probably beat me to working out
what the precise circumstances are in which it should be retrying and
not returning the error, but I shall have a go at working that out
anyway, 'cos I have a warped sense of fun...

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/44

------------------------------------------------------------------------
On 2012-04-19T11:28:37+00:00 Jeff wrote:

There are a couple of patches that Trond has proposed upstream and for
stable for the other (non-oops) problem. Would any of the folks
suffering from it be able to test this and see if it fixes the issue?

    http://thread.gmane.org/gmane.linux.nfs/48690/focus=48705

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/45

------------------------------------------------------------------------
On 2012-04-19T13:00:52+00:00 Tim wrote:

(In reply to comment #21)
> There are a couple of patches that Trond has proposed upstream and for stable
> for the other (non-oops) problem. Would any of the folks suffering from it be
> able to test this and see if it fixes the issue?
> 
>     http://thread.gmane.org/gmane.linux.nfs/48690/focus=48705

Applied and tried.
This looks like correct behaviour:

[tim@groke ~]$ /usr/bin/flock ~/tmp/spon echo Fish
flock: /home/tim/tmp/spon: Bad file descriptor

(flock() should never have worked, really :-)

I will try this out with my original kmail problem when I have access to
that desktop again in a few hours.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/46

------------------------------------------------------------------------
On 2012-04-19T13:47:16+00:00 Jeff wrote:

By default, nfs.ko does flock() emulation on top of POSIX locks.

If you find that programs that use flock don't work properly on top of
NFS, you can mount with local_lock=flock and that will disable that
behavior. Of course, no other client will be aware of flock locks at
that point...

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/47

------------------------------------------------------------------------
On 2012-04-19T14:05:09+00:00 Tim wrote:

(In reply to comment #23)
> By default, nfs.ko does flock() emulation on top of POSIX locks.
> 
> If you find that programs that use flock don't work properly on top of NFS, 
> you
> can mount with local_lock=flock and that will disable that behavior. Of 
> course,
> no other client will be aware of flock locks at that point...

In which case this may still be a problem. I mount with local_lock=none

The only way to get correct flock() emulation would be to silently
reopen the FD as writable on the NFS server if LOCK_EX is requested (but
presumably keep letting the VFS believe it's read-only). flock() will
cheerfully let you exclusive-lock a read-only FD. Ick :-)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/48

------------------------------------------------------------------------
On 2012-04-19T16:09:38+00:00 Tim wrote:

Aha! flock in f17's util-linux-2.21.1 has a test for EIO, and retries
read-write. That might need an extra

case EBADF:

The check is not present in f16's util-linux-2.20.1

Was wondering why f16 behaved differently.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/49

------------------------------------------------------------------------
On 2012-04-19T17:53:19+00:00 Tim wrote:

(In reply to comment #22)
> I will try this out with my original kmail problem when I have access to that
> desktop again in a few hours.

OK, the original kmail problem is fixed too. This appears simply to
ignore the error it gets back from flock() anyway, once it actually
returns :-)

Interestingly, it too attempts an exclusive lock on a read-only open of
a file with read-write permission. Depending on how common this
behaviour is, it might be less than useless to silently upgrade the NFS
open to read-write on a flock(fd,LOCK_EX) IF (a) the file is actually
writable AND (b) it is somehow possible to know that it *was* flock()
requesting this operation and not fcntl().

...wanders off mumbling...

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/50

------------------------------------------------------------------------
On 2012-04-26T13:08:14+00:00 Bert wrote:


3.3.2-6.fc16.x86_64 is all now.  These

nfs4_reclaim_open_state: Lock reclaim failed!

messages are filling /var/log/messages at about 1MB/sec.  So with
an average / filesystem of 8 or 16GB, you can estimate the remaining
uptime after it hits you ...

So can I have this kernel OOPS back please?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/52

------------------------------------------------------------------------
On 2012-04-27T02:18:39+00:00 Ben wrote:

(In reply to comment #27)
> 
> 3.3.2-6.fc16.x86_64 is all now.  These 
> 
> nfs4_reclaim_open_state: Lock reclaim failed!
> 
> messages are filling /var/log/messages at about 1MB/sec.  So with
> an average / filesystem of 8 or 16GB, you can estimate the remaining
> uptime after it hits you ...
> 
> So can I have this kernel OOPS back please?

It looks like this is avoided in mainline due to a previous commit:

commit 96dcadc2fdd111dca90d559f189a30c65394451a
Author: William Dauchy <wdau...@gmail.com>
Date:   Wed Mar 14 12:32:04 2012 +0100

    NFSv4: Rate limit the state manager for lock reclaim warning
messages

This should presumably go into stable as well, though it seems to me
that that still leaves a bug to fix - either the warning is real and we
should try to avoid the problem, or the warning is bogus and should be
removed.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/54

------------------------------------------------------------------------
On 2012-04-27T05:19:48+00:00 Tim wrote:

(In reply to comment #28)
> (In reply to comment #27)
> > 
> > 3.3.2-6.fc16.x86_64 is all now.  These 
> > 
> > nfs4_reclaim_open_state: Lock reclaim failed!
> > 
> 
> It looks like this is avoided in mainline due to a previous commit:
> 
> commit 96dcadc2fdd111dca90d559f189a30c65394451a
> Author: William Dauchy <wdau...@gmail.com>
> Date:   Wed Mar 14 12:32:04 2012 +0100
> 
>     NFSv4: Rate limit the state manager for lock reclaim warning messages

Either that or the patch mentioned in comment #21 isn't in yet. If you
have stuff hanging on a flock() attempt, you will need that one or one
very similar.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/55

------------------------------------------------------------------------
On 2012-05-02T11:34:36+00:00 Rik wrote:

Hi,

I've applied the following patches on top of the 3.3.4 kernel I got from
Koji:

https://admin.fedoraproject.org/updates/kernel-3.3.4-1.fc16?_csrf_token=1827f10d3a7a13866a70eb79f6cef4fbc950611d

They seem to resolve this bug. I believe these patches have been queued
by Greg KH for the 3.3 stable queue.

I added the following patches:

NFS: put open context on error in nfs_flush_multi
commit 8ccd271f7a3a846ce6f85ead0760d9d12994a611 upstream

nfs: Enclose hostname in brackets when needed in
commit 98a2139f4f4d7b5fcc3a54c7fddbe88612abed20 upstream.

NFSv4: Ensure that we check lock exclusive/shared type against open modes
commit 55725513b5ef9d462aa3e18527658a0362aaae83 upstream

NFSv4: Ensure that the LOCK code sets exception->inode
commit 05ffe24f5290dc095f98fbaf84afe51ef404ccc5 upstream

I believe only the last two are needed to fix this issue.

Please apply these patches to the next Fedora kernel update.

Rik

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/57

------------------------------------------------------------------------
On 2012-05-02T12:15:31+00:00 Josh wrote:

(In reply to comment #30)
> Hi,
> 
> I've applied the following patches on top of the 3.3.4 kernel I got from Koji:
> 
> https://admin.fedoraproject.org/updates/kernel-3.3.4-1.fc16?_csrf_token=1827f10d3a7a13866a70eb79f6cef4fbc950611d
> 
> They seem to resolve this bug. I believe these patches have been queued by 
> Greg
> KH for the 3.3 stable queue.
> 
> I added the following patches:
> 
> NFS: put open context on error in nfs_flush_multi
> commit 8ccd271f7a3a846ce6f85ead0760d9d12994a611 upstream
> 
> nfs: Enclose hostname in brackets when needed in
> commit 98a2139f4f4d7b5fcc3a54c7fddbe88612abed20 upstream.
> 
> NFSv4: Ensure that we check lock exclusive/shared type against open modes
> commit 55725513b5ef9d462aa3e18527658a0362aaae83 upstream
> 
> NFSv4: Ensure that the LOCK code sets exception->inode
> commit 05ffe24f5290dc095f98fbaf84afe51ef404ccc5 upstream
> 
> I believe only the last two are needed to fix this issue.
> 
> Please apply these patches to the next Fedora kernel update.

Justin put all of those into the F17 kernel last night.  We'll pick them
up in f16 relatively soon.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/58

------------------------------------------------------------------------
On 2012-05-04T07:11:56+00:00 Bert wrote:

Solved in kernel-3.3.4-3.fc16.

Tnx all!

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664/comments/59


** Changed in: linux
       Status: Unknown => Fix Released

** Changed in: linux
   Importance: Unknown => High

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/974664

Title:
  Kernel Oops - BUG: unable to handle kernel paging request at
  ffffffffffffffb8; RIP: 0010:[<ffffffffa05a5839>]  [<ffffffffa05a5839>]
  nfs_have_delegation+0x9/0x40 [nfs]

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Precise:
  Fix Released

Bug description:
  == Precise SRU Justification ==

  This bug is preventing users from using NFS clients on Precise.
  Several users have reported the issue, which has been already fixed
  upstreams but did not made it into stable yet.

  == Fix ==

  There are four commits that are relevant to fix this issue.  From
  mainline kernel:
   - 14977489ffdb80d4caf5a184ba41b23b02fbacd9 (cherry-pick)
   - 96dcadc2fdd111dca90d559f189a30c65394451a (backported)
  From linux-nfs git tree
  (git://git.linux-nfs.org/projects/trondmy/linux-nfs.git):
   - 487790f27df9bb27d3400486bd021dd59edc7589 (cherry-pick)
   - 5de4815015e550bdd33f39650554325540356f0c (cherry-pick)

  == Impact ==

  There are several users reporting the same issue when running NFS
  clients on Precise.  Other reports have also been found with the same
  issue:

  https://bugzilla.redhat.com/show_bug.cgi?id=811138

  == Test Case ==

  According to one of the bug reporters:

  "Logging into a 12.04 client system with autofs5, LDAP, kerberos authenticated
  nfs4 mounts. However this is to a server which is running 10.04 (contrary to 
#4
  it seems) In my case the triggering process is gnome-keyring-d."

  
========================================================================================

  I have no idea.

  ProblemType: KernelOops
  DistroRelease: Ubuntu 12.04
  Package: linux-image-3.2.0-22-generic 3.2.0-22.35
  ProcVersionSignature: Ubuntu 3.2.0-22.35-generic 3.2.14
  Uname: Linux 3.2.0-22-generic x86_64
  NonfreeKernelModules: fglrx
  AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
  Annotation: Your system might become unstable now and might need to be 
restarted.
  ApportVersion: 2.0-0ubuntu4
  Architecture: amd64
  CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 
not found.
  Date: Thu Apr  5 14:35:46 2012
  Failure: oops
  HibernationDevice: RESUME=UUID=73ced0d0-9777-4d7c-8841-8bd3c57ec88c
  InstallationMedia: Ubuntu 11.10 "Oneiric Ocelot" - Release amd64 (20111012)
  IwConfig:
   lo        no wireless extensions.

   eth2      no wireless extensions.
  MachineType: Gigabyte Technology Co., Ltd. GA-MA78GM-US2H
  ProcFB:

  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.2.0-22-generic 
root=UUID=dd826ebd-811d-4f2b-ac6a-5f527aee88cd ro recovery nomodeset
  PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No 
PulseAudio daemon running, or not running as session daemon.
  RelatedPackageVersions: kerneloops-daemon 0.12+git20090217-1ubuntu18
  RfKill:

  SourcePackage: linux
  Title: BUG: unable to handle kernel paging request at ffffffffffffffb8
  UpgradeStatus: Upgraded to precise on 2012-03-21 (15 days ago)
  dmi.bios.date: 10/08/2009
  dmi.bios.vendor: Award Software International, Inc.
  dmi.bios.version: F8
  dmi.board.name: GA-MA78GM-US2H
  dmi.board.vendor: Gigabyte Technology Co., Ltd.
  dmi.board.version: x.x
  dmi.chassis.type: 3
  dmi.chassis.vendor: Gigabyte Technology Co., Ltd.
  dmi.modalias: 
dmi:bvnAwardSoftwareInternational,Inc.:bvrF8:bd10/08/2009:svnGigabyteTechnologyCo.,Ltd.:pnGA-MA78GM-US2H:pvr:rvnGigabyteTechnologyCo.,Ltd.:rnGA-MA78GM-US2H:rvrx.x:cvnGigabyteTechnologyCo.,Ltd.:ct3:cvr:
  dmi.product.name: GA-MA78GM-US2H
  dmi.sys.vendor: Gigabyte Technology Co., Ltd.

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/974664/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 974664] Re: Kernel Oops - BUG: unable to handle kernel paging request at ffffffffffffffb8; RIP: 0010:[] [] nfs_have_delegation+0x9/0x40 [nfs]

Reply via email to