A week ago I installed a fresh lucid 10.04 amd64 desktop onto a workstation (Athlon II 240, 4GB ECC RAM, 1TB SATAII disk). Within a day this machine locked up with no response to keyboard or mouse. I could ping it yet I couldn't ssh to it, luckily Magic sysrq + REISUB was able to sync the local disk, yet it wouldn't reboot. After looking at the logs I noticed the nfs and kswap errors that eventually brought me to this bug report (I have attached a portion of /var/log/messages showing the similar errors).
At first I couldn't reliably reproduce the lockup, it just happened on its own. However I was able to reproduce it in a few minutes by running a simple loop which copied a cd image to and from an nfs mount then diffing the contents. I later found that I could cause the lockup to occur in under 10 seconds by adding a second instance of the copy loop while also running memtester on half (2GB) of the RAM (which allocates and mlocks the RAM). If I do this test from a VT I can watch the kmesg/nfs dmesg logs you see at the top of this bug report being displayed on the VT in real time. I am using autofs to mount nfs using the following parameters: server: rw,sync,no_root_squash,no_subtree_check client: rw,hard,intr,tcp,fg,nfsvers=3,rsize=32768,wsize=32768 After reading this bug report and the ones from the kernel development I got the impression that the problem was fixed in more recent kernels. Luckily the kernel-ppa team has ported the maverick 2.6.35 kernel for use in lucid. I used the following commands to try out the 2.6.35-21 maverick kernel on the lucid workstation. Unfortunately the lock up happened even with the maverick kernel. sudo add-apt-repository ppa:kernel-ppa/ppa sudo apt-get update sudo apt-get install linux-headers-2.6.35-21-generic linux-image-generic-lts-backport-maverick sudo apt-get reboot Apparently this nfs bug is present in not only 2.6.32 yet all the way up to 2.6.35 (four different releases), which ultimately means anyone expecting to use lucid or maverick with nfs will either have to live with lock ups or hope that it eventually gets fixed. Is there something unique to all of our systems that is masking this from being found during normal regression testing, or perhaps I should ask if NFS is even part of the regular testing? ** Attachment added: "/var/log/messages nfs and kswap errors" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/561210/+attachment/1609450/+files/log.txt -- Writing big files to NFS target causes system lock up https://bugs.launchpad.net/bugs/561210 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs