Package: src:linux Version: 3.2.35-2~bpo60+1 Severity: normal Tags: upstream patch
*** Please type your report below this line *** We are running a NFSv4 server on Debian Squeeze with 3.2.0-0.bpo.4-amd64 and recently started observing performance degradation in the form of slower response time for all NFS call. Investigation found elevated CPU usage at kernel level (%sys) which was tracked down to mutex_spin_on_owner and nfsd4_release_lockowner. Sample output of "perf top": Events: 70K cycles 61.46% [kernel] [k] mutex_spin_on_owner 30.68% [nfsd] [k] nfsd4_release_lockowner 1.50% [kernel] [k] intel_idle 0.11% [kernel] [k] irq_entries_start 0.11% [sunrpc] [k] svc_recv [...] As the problem was first happening intermittently but for hours at a time, network captures were also taken and compared. It was found that the problem (slower NFS response time and elevated kernel CPU usage) was correlated with an elevated rate of RELEASE_LOCKOWNER requests. The source for nfsd4_release_lockowner looks like this: 4252 __be32 4253 nfsd4_release_lockowner(struct svc_rqst *rqstp, 4254 struct nfsd4_compound_state *cstate, 4255 struct nfsd4_release_lockowner *rlockowner) 4256 { .... 4275 nfs4_lock_state(); 4276 4277 status = nfserr_locks_held; 4278 /* XXX: we're doing a linear search through all the lockowners. 4279 * Yipes! For now we'll just hope clients aren't really using 4280 * release_lockowner much, but eventually we have to fix these 4281 * data structures. */ 4282 INIT_LIST_HEAD(&matches); 4283 for (i = 0; i < LOCK_HASH_SIZE; i++) { 4284 list_for_each_entry(sop, &lock_ownerstr_hashtbl[i], so_strhash) { 4285 if (!same_owner_str(sop, owner, clid)) 4286 continue; 4287 list_for_each_entry(stp, &sop->so_stateids, 4288 st_perstateowner) { 4289 lo = lockowner(sop); 4290 if (check_for_locks(stp->st_file, lo)) 4291 goto out; 4292 list_add(&lo->lo_list, &matches); 4293 } 4294 } 4295 } 4296 /* Clients probably won't expect us to return with some (but not all) 4297 * of the lockowner state released; so don't release any until all 4298 * have been checked. */ 4299 status = nfs_ok; 4300 while (!list_empty(&matches)) { 4301 lo = list_entry(matches.next, struct nfs4_lockowner, 4302 lo_list); 4303 /* unhash_stateowner deletes so_perclient only 4304 * for openowners. */ 4305 list_del(&lo->lo_list); 4306 release_lockowner(lo); 4307 } 4308 out: 4309 nfs4_unlock_state(); 4310 return status; 4311 } So the problem is even documented at that level. Looking through upstream git it appears a fix was applied in http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=06f1f864d4ae5804e83785308d41f14a08e4b980 We have not verified this patch applies cleanly or actually resolves the problem but it seems likely. Can you please consider applying that patch? I have looked through existing Debian bug reports and I could not find anything relevant although this one has similar superficial symptoms: http://bugs.debian.org/692957 For the record, Dropbox 1.4.0 seems to be triggering an excessive amount of RELEASE_LOCKOWNER requests, thus causing this problem. A separate bug report has been filled with Dropbox: https://forums.dropbox.com/topic.php?id=96061&replies=1 -- Package-specific info: ** Version: Linux version 3.2.0-0.bpo.4-amd64 (debian-ker...@lists.debian.org) (gcc version 4.4.5 (Debian 4.4.5-8) ) #1 SMP Debian 3.2.35-2~bpo60+1 ** Command line: BOOT_IMAGE=/boot/vmlinuz-3.2.0-0.bpo.4-amd64 root=UUID=ca958596-33fb-4e2c-9d87-16a74a584ea2 ro console=tty0 quiet ** Not tainted ** Loaded modules: uinput nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipmi_devintf ipmi_msghandler loop snd_hda_intel snd_hda_codec snd_hwdep snd_pcm i7core_edac edac_core ioatdma i2c_i801 snd_timer snd psmouse acpi_cpufreq mperf i2c_core tpm_tis serio_raw coretemp processor button evdev pcspkr dca crc32c_intel tpm tpm_bios soundcore snd_page_alloc thermal_sys ext4 mbcache jbd2 crc16 dm_mod raid10 raid1 md_mod sd_mod crc_t10dif usbhid hid uhci_hcd ehci_hcd usbcore ahci libahci libata e1000e scsi_mod usb_common -- System Information: Debian Release: 6.0.2 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 3.2.0-0.bpo.4-amd64 (SMP w/4 CPU cores) Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages linux-image-3.2.0-0.bpo.4-amd64 depends on: ii debconf [debconf-2.0] 1.5.36.1 Debian configuration management sy ii initramfs-tools [linux-init 0.99~bpo60+1 tools for generating an initramfs ii linux-base 3.4~bpo60+1 Linux image base package ii module-init-tools 3.12-1 tools for managing Linux kernel mo Versions of packages linux-image-3.2.0-0.bpo.4-amd64 recommends: pn firmware-linux-free <none> (no description available) Versions of packages linux-image-3.2.0-0.bpo.4-amd64 suggests: pn debian-kernel-handbook <none> (no description available) ii grub-pc 1.98+20100804-14 GRand Unified Bootloader, version pn linux-doc-3.2 <none> (no description available) Versions of packages linux-image-3.2.0-0.bpo.4-amd64 is related to: pn firmware-atheros <none> (no description available) pn firmware-bnx2 <none> (no description available) pn firmware-bnx2x <none> (no description available) pn firmware-brcm80211 <none> (no description available) pn firmware-intelwimax <none> (no description available) pn firmware-ipw2x00 <none> (no description available) pn firmware-ivtv <none> (no description available) pn firmware-iwlwifi <none> (no description available) pn firmware-libertas <none> (no description available) pn firmware-linux <none> (no description available) pn firmware-linux-nonfree <none> (no description available) pn firmware-myricom <none> (no description available) pn firmware-netxen <none> (no description available) pn firmware-qlogic <none> (no description available) pn firmware-ralink <none> (no description available) pn firmware-realtek <none> (no description available) pn xen-hypervisor <none> (no description available) -- debconf information: linux-image-3.2.0-0.bpo.4-amd64/postinst/missing-firmware-3.2.0-0.bpo.4-amd64: linux-image-3.2.0-0.bpo.4-amd64/prerm/removing-running-kernel-3.2.0-0.bpo.4-amd64: true linux-image-3.2.0-0.bpo.4-amd64/postinst/depmod-error-initrd-3.2.0-0.bpo.4-amd64: false linux-image-3.2.0-0.bpo.4-amd64/postinst/ignoring-ramdisk: -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org