Am Mittwoch, den 11.08.2010, 04:13 +0100 schrieb Ben Hutchings: > On Mon, 2010-08-09 at 11:24 +0200, Lukas Kolbe wrote: > > So, testing begins. > > > > First conclusion: not all traffic patterns produce the page allocation > > failure. rdiff-backup only writing to an nfs-share does no harm; > > rdiff-backup reading and writing (incremental backup) leads to (nearly > > immediate) error. > > > > The nfs-share is always mounted with proto=tcp and nfsv3; /proc/mount says: > > fileserver.backup...:/export/backup/lbork /.cbackup-mp nfs > > rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=65535,timeo=600,retrans=2,sec=sys,mountport=65535,addr=x.x.x.x > > 0 0 > [...] > > I've seen some recent discussion of a bug in the Linux NFS client that > can cause it to stop working entirely in case of some packet loss events > <https://bugzilla.kernel.org/show_bug.cgi?id=16494>. It is possible > that you are running into that bug. I haven't yet seen an agreement on > the fix for it.
Thanks, I'll look into it. I ran some further tests with vanilla and debian kernels: VERSION WORKING --------------------------- 2.6.35 yes 2.6.33.6 yes 2.6.32.17 doesn't boot as kvm guest 2.6.32.17-2.6.32-19 no 2.6.32.17-2.6.32-18 no 2.6.32.16 no I don't know if this is related to #16494 since I'm unable to trigger it on 2.6.33.6 or 2.6.35. I'll test 2.6.32 with the patch from http://lkml.org/lkml/2010/8/10/52 applied as well and bisect between 2.6.32.17 and 2.6.33.6 in the next few days. > I also wonder whether the extremely large request sizes (rsize and > wsize) you have selected are more likely to trigger the allocation > failure in virtio_net. Please can you test whether reducing them helps? The large rsize/wsize were automatically chosen, but I'll test with a failing kernel and [rw]size of 32768. Kind regards, Lukas -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org