Hi, On Thu, Nov 23, 2000 at 07:57:08PM -0500, Roland McGrath wrote: > > This seems to be some race condition betweenm the sync thread and other > > dn_set_?time mangling stuff. > > I would tend to agree. Notice for example thread 3, which appears to be a > peropen in the process of dying. That is probably the temacs open file > descriptor on the file being written, being closed within the few seconds > while your sleep call was blocking ext2fs from crashing. Something to > think about is that (last I knew) temacs is writing the data with mmap > rather than write; that indicates the possibility of the file pager being > the suspect agent interacting with the sync thread. This is probably strengthened by my new info. I followed your advice and found out that the positive ctime and mtime settings come from block_getblk in ext2fs/getblk.c: (Only the end of the function, with my change 1->14) node->dn_set_ctime = node->dn_set_mtime = 14; node->dn_stat.st_blocks += 1 << log2_stat_blocks_per_fs_block; node->dn_stat_dirty = 14; return 0; } I did this test twice, with the same result. You'll find the 14's in the attached stack trace. This seems to point at diskfs_grow/pager_unlock_page vs sync thread (again! ;). I want to point out that the above does not call diskfs_node_update, while inode_getblk does in at least some cases: node->dn_set_ctime = node->dn_set_mtime = 13; node->dn_stat.st_blocks += 1 << log2_stat_blocks_per_fs_block; node->dn_stat_dirty = 13; if (diskfs_synchronous || node->dn->info.i_osync) diskfs_node_update (node, 1); return 0; } Note that the assertion failure comes shortly after the message "Dumping under names..." appears on the screen. dump-emacs calls map_out_data, but a grep showed nothing further about map+out+data, I don't know where it is defined... (strange). > A wacky idea > that might help narrow down quickly is to frob every place that sets > dn_set_?time so that instead of setting them to 1 it uses a unique nonzero > value in each location (or maybe fetches the caller's PC or something); > then you should be able to see what code touched it last in the race. Okay, that's done. I could probably set the value dependant on the caller by passing down an arg from pager_unlock_page/diskfs_grow or somewhere, if this might be helpful. > > * The node->lock is held, which should probably avoid syncing? > > > * Not all parts of the system which set some dn_set_?time flag call > > diskfs_node_update consecutively (for example, write_symlink). > > I don't think that ought to be a problem; in fact, it would be seriously > inefficient to make them do so. This just means that the time fields don't > need to be updated (e.g. atime for a read) until the node is making its > normal way out to be synchronized. Whenever someone does call > diskfs_node_update, that will take care of dn_set_?time. But I'm not > entirely clearly on all this code. Me too. Especially I am wondering what *should* guarantee that dn_set_?time are cleared before write_node is entered. I can't see anything in the code that takes care of that. write_node is called by 1. diskfs_write_disknode, which is only called from diskfs_node_update, which calls diskfs_set_node_times, which clears the flags (OKAY); and 2. write_one_disknode (in write_all_disknodes), which calls diskfs_set_node_times, then pokel_sync indirect blocks (with wait=1), then write_node. What we see is a failure in case 2. Between write_one_disknode and write_node, while pokel_sync runs, it seems that the file is grown and block_getblk called. Note the utter lack of any locking here. So what happens when we extend the file while it's indirect block list is pokel_sync'ed to the disk? Is this actually possible? Thanks, Marcus -- `Rhubarb is no Egyptian god.' Debian http://www.debian.org [EMAIL PROTECTED] Marcus Brinkmann GNU http://www.gnu.org [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.marcus-brinkmann.de
Script started on Fri Nov 24 02:50:43 2000 hurd:~# gdb /hrud/ex /hurd/ext2fs.emacs 87n GNU gdb 5.0 Copyright 2000 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-gnu0.2"... /root/87: No such file or directory. Attaching to program `/hurd/ext2fs.emacs', pid 87 [Switching to thread 87.1] (gdb) inft o thr 79 thread 87.79 0x809b6fc in mach_msg_trap () [..snip..] 2 thread 87.2 0x809b6fc in mach_msg_trap () * 1 thread 87.1 0x809b6fc in mach_msg_trap () (gdb) bt #0 0x809b6fc in mach_msg_trap () #1 0x80760fd in mach_msg () #2 0x80707c9 in cproc_block () at ../../libthreads/cprocs.c:665 #3 0x8070bec in condition_wait (c=0x80fa550, m=0x80fa528) at ../../libthreads/cprocs.c:811 #4 0x8071ff2 in cthread_exit (result=0x0) at ../../libthreads/cthreads.c:387 #5 0x804a599 in main (argc=2, argv=0x101ffa4) at ../../ext2fs/ext2fs.c:204 #6 0x80800b0 in __libc_start_main () (gdb) thr 2 [Switching to thread 2 (thread 87.2)] #0 0x809b6fc in mach_msg_trap () (gdb) bt #0 0x809b6fc in mach_msg_trap () #1 0x80760fd in mach_msg () #2 0x80763f3 in mach_msg_server_timeout () #3 0x8076447 in mach_msg_server () #4 0x809c3f9 in _hurd_msgport_receive () #5 0x8071aab in cthread_body (self=0x812af18) at ../../libthreads/cthreads.c:294 (gdb) thr 3 [Switching to thread 3 (thread 87.3)] #0 0x809b6fc in mach_msg_trap () (gdb) bt #0 0x809b6fc in mach_msg_trap () #1 0x80760fd in mach_msg () #2 0x80763f3 in mach_msg_server_timeout () #3 0x8072f34 in thread_function (master=1) at ../../libports/manage-multithread.c:136 #4 0x8073011 in ports_manage_port_operations_multithread (bucket=0x812b3e0, demuxer=0x8055a18 <diskfs_demuxer>, thread_timeout=0, global_timeout=0, hook=0) at ../../libports/manage-multithread.c:164 #5 0x8055627 in master_thread_function (foo=0x0) at ../../libdiskfs/init-first.c:35 #6 0x8071aab in cthread_body (self=0x812b518) at ../../libthreads/cthreads.c:294 (gdb) btthr 3 4 [Switching to thread 4 (thread 87.4)] #0 0x809b6fc in mach_msg_trap () (gdb) thr 4[3Pbt #0 0x809b6fc in mach_msg_trap () #1 0x80760fd in mach_msg () #2 0x80763f3 in mach_msg_server_timeout () #3 0x8072f34 in thread_function (master=1) at ../../libports/manage-multithread.c:136 #4 0x8073011 in ports_manage_port_operations_multithread (bucket=0x812b570, demuxer=0x8062594 <pager_demuxer>, thread_timeout=0, global_timeout=0, hook=0) at ../../libports/manage-multithread.c:164 #5 0x805053a in service_paging_requests (arg=0x812b570) at ../../libdiskfs/disk-pager.c:41 #6 0x8071aab in cthread_body (self=0x812b620) at ../../libthreads/cthreads.c:294 (gdb) btthr 4 5 [Switching to thread 5 (thread 87.5)] #0 0x809b6fc in mach_msg_trap () (gdb) thr 5[3Pbt #0 0x809b6fc in mach_msg_trap () #1 0x80760fd in mach_msg () #2 0x80763f3 in mach_msg_server_timeout () #3 0x8072f34 in thread_function (master=0) at ../../libports/manage-multithread.c:136 #4 0x8071aab in cthread_body (self=0x812bbf8) at ../../libthreads/cthreads.c:294 (gdb) btthr 5 6 [Switching to thread 6 (thread 87.6)] #0 0x809b6fc in mach_msg_trap () (gdb) thr 6[3Pbt #0 0x809b6fc in mach_msg_trap () #1 0x80760fd in mach_msg () #2 0x80763f3 in mach_msg_server_timeout () #3 0x8072f34 in thread_function (master=0) at ../../libports/manage-multithread.c:136 #4 0x8071aab in cthread_body (self=0x812bf20) at ../../libthreads/cthreads.c:294 (gdb) btthr 6 7 [Switching to thread 7 (thread 87.7)] #0 0x809b6fc in mach_msg_trap () (gdb) thr 7[3Pbt #0 0x809b6fc in mach_msg_trap () #1 0x80760fd in mach_msg () #2 0x80763f3 in mach_msg_server_timeout () #3 0x8072f34 in thread_function (master=0) at ../../libports/manage-multithread.c:136 #4 0x8071aab in cthread_body (self=0x812c020) at ../../libthreads/cthreads.c:294 (gdb) btthr 7 8 [Switching to thread 8 (thread 87.8)] #0 0x809b6fc in mach_msg_trap () (gdb) thr 8[3Pbt #0 0x809b6fc in mach_msg_trap () #1 0x80760fd in mach_msg () #2 0x8094474 in sleep () #3 0x804c7e6 in write_node (np=0x8154c20) at ../../ext2fs/inode.c:416 #4 0x804cbf1 in write_one_disknode (node=0x8154c20) at ../../ext2fs/inode.c:592 #5 0x804cb75 in diskfs_node_iterate (fun=0x108ff18) at ../../ext2fs/inode.c:568 #6 0x804ccb6 in write_all_disknodes () at ../../ext2fs/inode.c:599 #7 0x804ea90 in diskfs_sync_everything (wait=0) at ../../ext2fs/pager.c:925 #8 0x8052f44 in periodic_sync (interval=5) at ../../libdiskfs/sync-interval.c:120 #9 0x8071aab in cthread_body (self=0x812c328) at ../../libthreads/cthreads.c:294 (gdb) btthr 8 9 [Switching to thread 9 (thread 87.9)] #0 0x809b6fc in mach_msg_trap () (gdb) thr 9[3Pbt #0 0x809b6fc in mach_msg_trap () #1 0x80760fd in mach_msg () #2 0x80763f3 in mach_msg_server_timeout () #3 0x8072f34 in thread_function (master=0) at ../../libports/manage-multithread.c:136 #4 0x8071aab in cthread_body (self=0x81327a8) at ../../libthreads/cthreads.c:294 (gdb) btthr 9 10 [Switching to thread 10 (thread 87.10)] #0 0x809b6fc in mach_msg_trap () (gdb) thr 10[4Pbt #0 0x809b6fc in mach_msg_trap () #1 0x80760fd in mach_msg () #2 0x80763f3 in mach_msg_server_timeout () #3 0x8072f34 in thread_function (master=0) at ../../libports/manage-multithread.c:136 #4 0x8071aab in cthread_body (self=0x8132660) at ../../libthreads/cthreads.c:294 (gdb) btthr 10 1 [Switching to thread 11 (thread 87.11)] #0 0x809b6fc in mach_msg_trap () (gdb) thr 11[4Pbt #0 0x809b6fc in mach_msg_trap () #1 0x80760fd in mach_msg () #2 0x80763f3 in mach_msg_server_timeout () #3 0x8072f34 in thread_function (master=0) at ../../libports/manage-multithread.c:136 #4 0x8071aab in cthread_body (self=0x8132458) at ../../libthreads/cthreads.c:294 (gdb) btthr 11 2 [Switching to thread 12 (thread 87.12)] #0 0x809b6fc in mach_msg_trap () (gdb) thr 12[4Pbt #0 0x809b6fc in mach_msg_trap () #1 0x80760fd in mach_msg () #2 0x80763f3 in mach_msg_server_timeout () #3 0x8072f34 in thread_function (master=0) at ../../libports/manage-multithread.c:136 #4 0x8071aab in cthread_body (self=0x8133610) at ../../libthreads/cthreads.c:294 (gdb) btthr 12 3 [Switching to thread 13 (thread 87.13)] #0 0x809b6fc in mach_msg_trap () (gdb) thr 13[4Pbt #0 0x809b6fc in mach_msg_trap () #1 0x80760fd in mach_msg () #2 0x80707c9 in cproc_block () at ../../libthreads/cprocs.c:665 #3 0x8070efc in __mutex_lock_solid (ptr=0x8154cbc) at ../../libthreads/cprocs.c:980 #4 0x805594d in diskfs_release_peropen (po=0x814d640) at ../../libdiskfs/peropen-rele.c:25 #5 0x8052769 in diskfs_protid_rele (arg=0x8149df0) at ../../libdiskfs/protid-rele.c:34 #6 0x8073653 in _ports_complete_deallocate (pi=0x8149df0) at ../../libports/complete-deallocate.c:49 #7 0x807292a in ports_port_deref (portstruct=0x8149df0) at ../../libports/port-deref.c:48 #8 0x8072e89 in internal_demuxer (inp=0x10dbf40, outheadp=0x10ddf40) at ../../libports/manage-multithread.c:104 #9 0x807639a in mach_msg_server_timeout () #10 0x8072f34 in thread_function (master=0) at ../../libports/manage-multithread.c:136 #11 0x8071aab in cthread_body (self=0x8133870) at ../../libthreads/cthreads.c:294 (gdb) btthr 13 4 [Switching to thread 14 (thread 87.14)] #0 0x809b6fc in mach_msg_trap () (gdb) thr 14[4Pbt #0 0x809b6fc in mach_msg_trap () #1 0x80760fd in mach_msg () #2 0x80763f3 in mach_msg_server_timeout () #3 0x8072f34 in thread_function (master=0) at ../../libports/manage-multithread.c:136 #4 0x8071aab in cthread_body (self=0x81333b8) at ../../libthreads/cthreads.c:294 [..snip..] (gdb) thr 8 [Switching to thread 8 (thread 87.8)] #0 0x809b6fc in mach_msg_trap () (gdb) bt #0 0x809b6fc in mach_msg_trap () #1 0x80760fd in mach_msg () #2 0x8094474 in sleep () #3 0x804c7e6 in write_node (np=0x8154c20) at ../../ext2fs/inode.c:416 #4 0x804cbf1 in write_one_disknode (node=0x8154c20) at ../../ext2fs/inode.c:592 #5 0x804cb75 in diskfs_node_iterate (fun=0x108ff18) at ../../ext2fs/inode.c:568 #6 0x804ccb6 in write_all_disknodes () at ../../ext2fs/inode.c:599 #7 0x804ea90 in diskfs_sync_everything (wait=0) at ../../ext2fs/pager.c:925 #8 0x8052f44 in periodic_sync (interval=5) at ../../libdiskfs/sync-interval.c:120 #9 0x8071aab in cthread_body (self=0x812c328) at ../../libthreads/cthreads.c:294 (gdb) up 4 bt full #0 0x809b6fc in mach_msg_trap () No symbol table info available. #1 0x80760fd in mach_msg () No symbol table info available. #2 0x8094474 in sleep () No symbol table info available. #3 0x804c7e6 in write_node (np=0x8154c20) at ../../ext2fs/inode.c:416 st = (struct stat *) 0x8154c2c di = (struct ext2_inode *) 0x30129180 #4 0x804cbf1 in write_one_disknode (node=0x8154c20) at ../../ext2fs/inode.c:592 node = (struct node *) 0x8154c20 di = (struct ext2_inode *) 0x10004005 #5 0x804cb75 in diskfs_node_iterate (fun=0x108ff18) at ../../ext2fs/inode.c:568 err = 0 n = 975031623 num_nodes = 96 node = (struct node *) 0x8154c20 node_list = (struct node **) 0x108fd40 p = (struct node **) 0x108fd50 #6 0x804ccb6 in write_all_disknodes () at ../../ext2fs/inode.c:599 No locals. #7 0x804ea90 in diskfs_sync_everything (wait=0) at ../../ext2fs/pager.c:925 ---Type <return> to continue, or q <return> to quit--- No locals. #8 0x8052f44 in periodic_sync (interval=5) at ../../libdiskfs/sync-interval.c:120 err = 975031623 link = {thread = 58, next = 0x0, prevp = 0x812c0ec, notifies = 0x0, interrupted_next = 0x0} #9 0x8071aab in cthread_body (self=0x812c328) at ../../libthreads/cthreads.c:294 t = 0x812c2b8 (gdb) up 4 #4 0x804cbf1 in write_one_disknode (node=0x8154c20) at ../../ext2fs/inode.c:592 592 ../../ext2fs/inode.c: No such file or directory. (gdb) print *node $1 = {next = 0x80fa9e8, prevp = 0x80fa9e8, dn = 0x814df88, dn_stat = { st_fstype = 23, st_fsid = 87, st_ino = 65540, st_gen = 975031812, st_rdev = 0, st_mode = 33188, st_nlink = 1, st_uid = 0, st_gid = 1005, st_size = 4765562, st_atime = 975031623, st_atime_usec = 60000, st_mtime = 975031623, st_mtime_usec = 370000, st_ctime = 975031623, st_ctime_usec = 370000, st_blksize = 8192, st_blocks = 9336, st_author = 0, st_flags = 0, st_spare = {0 <repeats 11 times>}}, dn_set_ctime = 14, dn_set_atime = 0, dn_set_mtime = 14, dn_stat_dirty = 14, lock = {held = 1, lock = 0, name = 0x0, queue = {head = 0x8133870, tail = 0x8138ad0}}, references = 3, light_references = 1, sockaddr = 0, owner = 0, transbox = {active = 0, lock = 0x8154cbc, flags = 0, wakeup = { lock = 0, queue = {head = 0x0, tail = 0x0}, name = 0x0, implications = 0x0}, cookie = 0x8154c20}, userlock = {type = 8, wait = { lock = 0, queue = {head = 0x0, tail = 0x0}, name = 0x0, implications = 0x0}, waiting = 0, shcount = 0}, conch = { lock = 0x8154cbc, wait = {lock = 0, queue = {head = 0x0, tail = 0x0}, name = 0x0, implications = 0x0}, holder = 0x0, holder_shared_page = 0x0}, dirmod_reqs = 0x0, filemod_reqs = 0x0, allocsize = 4767744, cache_id = 65540, author_tracks_uid = 0} [..snip..] (gdb) btthr 78 9 [Switching to thread 79 (thread 87.79)] #0 0x809b6fc in mach_msg_trap () (gdb) thr 79[4Pbt #0 0x809b6fc in mach_msg_trap () #1 0x80760fd in mach_msg () #2 0x80707c9 in cproc_block () at ../../libthreads/cprocs.c:665 #3 0x8070efc in __mutex_lock_solid (ptr=0x8154cbc) at ../../libthreads/cprocs.c:980 #4 0x804c1d5 in diskfs_cached_lookup (inum=65540, npp=0x14fbd84) at ../../ext2fs/inode.c:78 #5 0x8051550 in diskfs_check_lookup_cache (dir=0x8137a10, name=0x14fdf75 "emacs") at ../../libdiskfs/name-cache.c:256 #6 0x8050f99 in diskfs_lookup (dp=0x8137a10, name=0x14fdf75 "emacs", type=LOOKUP, np=0x14fbe34, ds=0x0, cred=0x8145580) at ../../libdiskfs/lookup.c:149 #7 0x805a1a1 in diskfs_S_dir_lookup (dircred=0x8145580, path=0x14fdf5c "emacs20", flags=0, mode=0, retry=0x14fbf64, retryname=0x14fbf6c "", returned_port=0x14fc370, returned_port_poly=0x14fbe84) at ../../libdiskfs/dir-lookup.c:126 #8 0x80569ea in _Xdir_lookup (InHeadP=0x14fdf40, OutHeadP=0x14fbf40) at fsServer.c:1511 #9 0x8057bab in diskfs_fs_server (InHeadP=0x14fdf40, OutHeadP=0x14fbf40) at fsServer.c:2671 #10 0x8055a44 in diskfs_demuxer (inp=0x14fdf40, outp=0x14fbf40) at ../../libdiskfs/demuxer.c:32 #11 0x8072e6e in internal_demuxer (inp=0x14fdf40, outheadp=0x14fbf40) at ../../libports/manage-multithread.c:101 ---Type <return> to continue, or q <return> to quit--- #12 0x807639a in mach_msg_server_timeout () #13 0x8072f34 in thread_function (master=0) at ../../libports/manage-multithread.c:136 #14 0x8071aab in cthread_body (self=0x8138ad0) at ../../libthreads/cthreads.c:294