: :Matt, I told you about this before, but completely forgot about it. After :doing considerable testing on my test servers, i thought -current was safe :enough to try on our production shell servers. I installed -current on one :of my servers, and to my dismay, it hung. :) : :Within 5 minutes of running, nearly every process is blocked on 'inode', :with the exception of a single 'cp' stuck in vmopar. : :I have a very silly, *very* poorly written script i run out of cron, every :10 mins or so, to update my passwd and group files. : :#/bin/sh : :cp /home/private/passwd /etc :cp /home/private/master.passwd /etc :cp /home/private/group /etc :rm /etc/spwd.db.tmp >/dev/null 2>&1 :pwd_mkdb /etc/master.passwd : :Kevin : ( Also in a later conversation Kevin indicated that a cron job on the server was updating /home/private/, creating a race between the server operating on /home/private and the client trying to copy files from /home/private. It is this race which is revealing the bug ).
I've managed to repeat the problem with two scripts. On the server: while (1) cp file1 file2 echo -n "l" end And on the client: while (1) cp file2 /tmp/test3 echo -n "C" end On the client: ccccccccccccccccp: /tmp/test3: Bad address cccccccccccccccp: /tmp/test3: Bad address cccp: /tmp/test3: Bad address cccccccccp: /tmp/test3: Bad address ccccccccccccccccccp: /tmp/test3: Bad address cccccccccp: /tmp/test3: Bad address cccccccccccccp: /tmp/test3: Bad address cccccccccccccccccccccccccccp: /tmp/test3: Bad address cccccccccccccccccccccccccccccccccp: /tmp/test3: Bad address ccccccccccccccccccccccccccccccccccccccccccccccccccccccccp: /tmp/test3: Bad address cc<hang> ( The Bad address errors are correct for NFS considering what the server is doing to the poor file. The hang of course is not ) The cp process on the client gets stuck in vmopar, as previously reported. Fortunately I can have a gdb already running on the client on the live kernel so it's easy to see what is going on. The problem is a same-process deadlock. A VM fault occurs accessing a NFS-backed page. The fault locks (PG_BUSY's) the page in question then calls vnode_pager_getpages() to bring the page in. This filters down into an nfs_getpages() call which then calls nfs_readrpc(). nfs_readrpc() normally ( and properly ) tries to keep the vnode synchronized to the NFS state returned by the RPC. The problem is that if the state indicates that the server has truncated the file, vnode_pager_setsize() will be called and will attempt to remove all the pages beyond the truncation point from the VM object. Unfortunately, at least one of those pages has been locked by the same process. Bewm. Deadlock. So, how to fix? The only thing I can think of is to pass a flag to nfs_readrpc() so it knows the RPC is related to a VM fault, and to then allow nfs_readrpc() to leave np->n_size and vap->va_size *unsynchronized* if a file truncation occurs. i.e. to avoid calling vnode_pager_setsize() and thus avoid the deadlock. This is kinda icky. We have no opportunity anywhere to call vnode_pager_setsize() because the faulted page must remain BUSY'd throughout the entire getpages operation. Comments? ( If I haven't confused the bajeezus out of everyone, that is :-) ) -Matt Matthew Dillon <dil...@backplane.com> 17754 c41d4940 c45e9000 0 15192 15192 804006 S cp vmopar c049b930 vm_page_t 0xc049b930: object = 0xc46297b4, pindex = 0x0, phys_addr = 0x2732000, queue = 0x0, flags = 0x83, (PG_BUSY|PG_WANTED|PG_REFERENCED) pc = 0x32, wire_count = 0x0, hold_count = 0x0, act_count = 0x0, busy = 0x0, valid = 0x0, dirty = 0x0 #0 mi_switch () at ../../kern/kern_synch.c:827 #1 0xc0137f21 in tsleep (ident=0xc049b930, priority=0x4, wmesg=0xc023bac7 "vmopar", timo=0x0) at ../../kern/kern_synch.c:443 #2 0xc01e8f12 in vm_object_page_remove (object=0xc46297b4, start=0x0, end=0x1, clean_only=0x0) at ../../vm/vm_page.h:555 #3 0xc01ed93f in vnode_pager_setsize (vp=0xc46208c0, nsize=0x0000000000000000) at ../../vm/vnode_pager.c:285 #4 0xc01a3017 in nfs_loadattrcache (vpp=0xc45eab94, mdp=0xc45eaba0, dposp=0xc45eaba4, vaper=0x0) at ../../nfs/nfs_subs.c:1383 #5 0xc01abc7c in nfs_readrpc (vp=0xc46208c0, uiop=0xc45eac08, cred=0xc09ba400) at ../../nfs/nfs_vnops.c:1060 #6 0xc0184f05 in nfs_getpages (ap=0xc45eac44) at ../../nfs/nfs_bio.c:154 #7 0xc01edefa in vnode_pager_getpages (object=0xc46297b4, m=0xc45eacec, count=0x1, reqpage=0x0) at vnode_if.h:1067 #8 0xc01e2069 in vm_fault (map=0xc41d8d40, vaddr=0x28057000, fault_type=0x1, fault_flags=0x0) at ../../vm/vm_pager.h:130 #9 0xc0207508 in trap_pfault (frame=0xc45ead94, usermode=0x0, eva=0x28057000) at ../../i386/i386/trap.c:791 To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-current" in the body of the message