Hi,

On Thu, Nov 23, 2000 at 07:57:08PM -0500, Roland McGrath wrote:
> > This seems to be some race condition betweenm the sync thread and other
> > dn_set_?time mangling stuff. 
> 
> I would tend to agree.  Notice for example thread 3, which appears to be a
> peropen in the process of dying.  That is probably the temacs open file
> descriptor on the file being written, being closed within the few seconds
> while your sleep call was blocking ext2fs from crashing.  Something to
> think about is that (last I knew) temacs is writing the data with mmap
> rather than write; that indicates the possibility of the file pager being
> the suspect agent interacting with the sync thread.

This is probably strengthened by my new info. I followed your advice and
found out that the positive ctime and mtime settings come from block_getblk
in ext2fs/getblk.c: (Only the end of the function, with my change 1->14)

  node->dn_set_ctime = node->dn_set_mtime = 14;
  node->dn_stat.st_blocks += 1 << log2_stat_blocks_per_fs_block;
  node->dn_stat_dirty = 14;

  return 0;
}

I did this test twice, with the same result.
You'll find the 14's in the attached stack trace. This seems to point at
diskfs_grow/pager_unlock_page vs sync thread (again! ;). I want to point out
that the above does not call diskfs_node_update, while inode_getblk does in
at least some cases:

  node->dn_set_ctime = node->dn_set_mtime = 13;
  node->dn_stat.st_blocks += 1 << log2_stat_blocks_per_fs_block;
  node->dn_stat_dirty = 13;

  if (diskfs_synchronous || node->dn->info.i_osync)
    diskfs_node_update (node, 1);

  return 0;
}

Note that the assertion failure comes shortly after the message
"Dumping under names..." appears on the screen. dump-emacs calls
map_out_data, but a grep showed nothing further about map+out+data,
I don't know where it is defined... (strange).

> A wacky idea
> that might help narrow down quickly is to frob every place that sets
> dn_set_?time so that instead of setting them to 1 it uses a unique nonzero
> value in each location (or maybe fetches the caller's PC or something);
> then you should be able to see what code touched it last in the race.

Okay, that's done. I could probably set the value dependant on the caller by
passing down an arg from pager_unlock_page/diskfs_grow or somewhere, if this
might be helpful.

> > * The node->lock is held, which should probably avoid syncing?
> 
> > * Not all parts of the system which set some dn_set_?time flag call
> > diskfs_node_update consecutively (for example, write_symlink). 
> 
> I don't think that ought to be a problem; in fact, it would be seriously
> inefficient to make them do so.  This just means that the time fields don't
> need to be updated (e.g. atime for a read) until the node is making its
> normal way out to be synchronized.  Whenever someone does call
> diskfs_node_update, that will take care of dn_set_?time.  But I'm not
> entirely clearly on all this code.

Me too. Especially I am wondering what *should* guarantee that dn_set_?time
are cleared before write_node is entered. I can't see anything in the code
that takes care of that. write_node is called by

1. diskfs_write_disknode, which is only called from diskfs_node_update,
   which calls diskfs_set_node_times, which clears the flags (OKAY); and

2. write_one_disknode (in write_all_disknodes), which calls
   diskfs_set_node_times, then pokel_sync indirect blocks (with wait=1),
   then write_node.

What we see is a failure in case 2. Between write_one_disknode and
write_node, while pokel_sync runs, it seems that the file is grown and
block_getblk called. Note the utter lack of any locking here.

So what happens when we extend the file while it's indirect block list is
pokel_sync'ed to the disk? Is this actually possible?

Thanks,
Marcus

-- 
`Rhubarb is no Egyptian god.' Debian http://www.debian.org [EMAIL PROTECTED]
Marcus Brinkmann              GNU    http://www.gnu.org    [EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.marcus-brinkmann.de
Script started on Fri Nov 24 02:50:43 2000
hurd:~# gdb /hrud/ex        /hurd/ext2fs.emacs 87n 
GNU gdb 5.0
Copyright 2000 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-unknown-gnu0.2"...
/root/87: No such file or directory.
Attaching to program `/hurd/ext2fs.emacs', pid 87
[Switching to thread 87.1]
(gdb) inft o thr
  79 thread 87.79  0x809b6fc in mach_msg_trap ()

[..snip..]

  2 thread 87.2  0x809b6fc in mach_msg_trap ()
* 1 thread 87.1  0x809b6fc in mach_msg_trap ()
(gdb) bt
#0  0x809b6fc in mach_msg_trap ()
#1  0x80760fd in mach_msg ()
#2  0x80707c9 in cproc_block () at ../../libthreads/cprocs.c:665
#3  0x8070bec in condition_wait (c=0x80fa550, m=0x80fa528)
    at ../../libthreads/cprocs.c:811
#4  0x8071ff2 in cthread_exit (result=0x0) at ../../libthreads/cthreads.c:387
#5  0x804a599 in main (argc=2, argv=0x101ffa4) at ../../ext2fs/ext2fs.c:204
#6  0x80800b0 in __libc_start_main ()
(gdb) thr 2
[Switching to thread 2 (thread 87.2)]
#0  0x809b6fc in mach_msg_trap ()
(gdb) bt
#0  0x809b6fc in mach_msg_trap ()
#1  0x80760fd in mach_msg ()
#2  0x80763f3 in mach_msg_server_timeout ()
#3  0x8076447 in mach_msg_server ()
#4  0x809c3f9 in _hurd_msgport_receive ()
#5  0x8071aab in cthread_body (self=0x812af18)
    at ../../libthreads/cthreads.c:294
(gdb) thr 3
[Switching to thread 3 (thread 87.3)]
#0  0x809b6fc in mach_msg_trap ()
(gdb) bt
#0  0x809b6fc in mach_msg_trap ()
#1  0x80760fd in mach_msg ()
#2  0x80763f3 in mach_msg_server_timeout ()
#3  0x8072f34 in thread_function (master=1)
    at ../../libports/manage-multithread.c:136
#4  0x8073011 in ports_manage_port_operations_multithread (bucket=0x812b3e0, 
    demuxer=0x8055a18 <diskfs_demuxer>, thread_timeout=0, global_timeout=0, 
    hook=0) at ../../libports/manage-multithread.c:164
#5  0x8055627 in master_thread_function (foo=0x0)
    at ../../libdiskfs/init-first.c:35
#6  0x8071aab in cthread_body (self=0x812b518)
    at ../../libthreads/cthreads.c:294
(gdb) btthr 3 4
[Switching to thread 4 (thread 87.4)]
#0  0x809b6fc in mach_msg_trap ()
(gdb) thr 4bt
#0  0x809b6fc in mach_msg_trap ()
#1  0x80760fd in mach_msg ()
#2  0x80763f3 in mach_msg_server_timeout ()
#3  0x8072f34 in thread_function (master=1)
    at ../../libports/manage-multithread.c:136
#4  0x8073011 in ports_manage_port_operations_multithread (bucket=0x812b570, 
    demuxer=0x8062594 <pager_demuxer>, thread_timeout=0, global_timeout=0, 
    hook=0) at ../../libports/manage-multithread.c:164
#5  0x805053a in service_paging_requests (arg=0x812b570)
    at ../../libdiskfs/disk-pager.c:41
#6  0x8071aab in cthread_body (self=0x812b620)
    at ../../libthreads/cthreads.c:294
(gdb) btthr 4 5
[Switching to thread 5 (thread 87.5)]
#0  0x809b6fc in mach_msg_trap ()
(gdb) thr 5bt
#0  0x809b6fc in mach_msg_trap ()
#1  0x80760fd in mach_msg ()
#2  0x80763f3 in mach_msg_server_timeout ()
#3  0x8072f34 in thread_function (master=0)
    at ../../libports/manage-multithread.c:136
#4  0x8071aab in cthread_body (self=0x812bbf8)
    at ../../libthreads/cthreads.c:294
(gdb) btthr 5 6
[Switching to thread 6 (thread 87.6)]
#0  0x809b6fc in mach_msg_trap ()
(gdb) thr 6bt
#0  0x809b6fc in mach_msg_trap ()
#1  0x80760fd in mach_msg ()
#2  0x80763f3 in mach_msg_server_timeout ()
#3  0x8072f34 in thread_function (master=0)
    at ../../libports/manage-multithread.c:136
#4  0x8071aab in cthread_body (self=0x812bf20)
    at ../../libthreads/cthreads.c:294
(gdb) btthr 6 7
[Switching to thread 7 (thread 87.7)]
#0  0x809b6fc in mach_msg_trap ()
(gdb) thr 7bt
#0  0x809b6fc in mach_msg_trap ()
#1  0x80760fd in mach_msg ()
#2  0x80763f3 in mach_msg_server_timeout ()
#3  0x8072f34 in thread_function (master=0)
    at ../../libports/manage-multithread.c:136
#4  0x8071aab in cthread_body (self=0x812c020)
    at ../../libthreads/cthreads.c:294
(gdb) btthr 7 8
[Switching to thread 8 (thread 87.8)]
#0  0x809b6fc in mach_msg_trap ()
(gdb) thr 8bt
#0  0x809b6fc in mach_msg_trap ()
#1  0x80760fd in mach_msg ()
#2  0x8094474 in sleep ()
#3  0x804c7e6 in write_node (np=0x8154c20) at ../../ext2fs/inode.c:416
#4  0x804cbf1 in write_one_disknode (node=0x8154c20)
    at ../../ext2fs/inode.c:592
#5  0x804cb75 in diskfs_node_iterate (fun=0x108ff18)
    at ../../ext2fs/inode.c:568
#6  0x804ccb6 in write_all_disknodes () at ../../ext2fs/inode.c:599
#7  0x804ea90 in diskfs_sync_everything (wait=0) at ../../ext2fs/pager.c:925
#8  0x8052f44 in periodic_sync (interval=5)
    at ../../libdiskfs/sync-interval.c:120
#9  0x8071aab in cthread_body (self=0x812c328)
    at ../../libthreads/cthreads.c:294
(gdb) btthr 8 9
[Switching to thread 9 (thread 87.9)]
#0  0x809b6fc in mach_msg_trap ()
(gdb) thr 9bt
#0  0x809b6fc in mach_msg_trap ()
#1  0x80760fd in mach_msg ()
#2  0x80763f3 in mach_msg_server_timeout ()
#3  0x8072f34 in thread_function (master=0)
    at ../../libports/manage-multithread.c:136
#4  0x8071aab in cthread_body (self=0x81327a8)
    at ../../libthreads/cthreads.c:294
(gdb) btthr 9 10
[Switching to thread 10 (thread 87.10)]
#0  0x809b6fc in mach_msg_trap ()
(gdb) thr 10bt
#0  0x809b6fc in mach_msg_trap ()
#1  0x80760fd in mach_msg ()
#2  0x80763f3 in mach_msg_server_timeout ()
#3  0x8072f34 in thread_function (master=0)
    at ../../libports/manage-multithread.c:136
#4  0x8071aab in cthread_body (self=0x8132660)
    at ../../libthreads/cthreads.c:294
(gdb) btthr 10 1
[Switching to thread 11 (thread 87.11)]
#0  0x809b6fc in mach_msg_trap ()
(gdb) thr 11bt
#0  0x809b6fc in mach_msg_trap ()
#1  0x80760fd in mach_msg ()
#2  0x80763f3 in mach_msg_server_timeout ()
#3  0x8072f34 in thread_function (master=0)
    at ../../libports/manage-multithread.c:136
#4  0x8071aab in cthread_body (self=0x8132458)
    at ../../libthreads/cthreads.c:294
(gdb) btthr 11 2
[Switching to thread 12 (thread 87.12)]
#0  0x809b6fc in mach_msg_trap ()
(gdb) thr 12bt
#0  0x809b6fc in mach_msg_trap ()
#1  0x80760fd in mach_msg ()
#2  0x80763f3 in mach_msg_server_timeout ()
#3  0x8072f34 in thread_function (master=0)
    at ../../libports/manage-multithread.c:136
#4  0x8071aab in cthread_body (self=0x8133610)
    at ../../libthreads/cthreads.c:294
(gdb) btthr 12 3
[Switching to thread 13 (thread 87.13)]
#0  0x809b6fc in mach_msg_trap ()
(gdb) thr 13bt
#0  0x809b6fc in mach_msg_trap ()
#1  0x80760fd in mach_msg ()
#2  0x80707c9 in cproc_block () at ../../libthreads/cprocs.c:665
#3  0x8070efc in __mutex_lock_solid (ptr=0x8154cbc)
    at ../../libthreads/cprocs.c:980
#4  0x805594d in diskfs_release_peropen (po=0x814d640)
    at ../../libdiskfs/peropen-rele.c:25
#5  0x8052769 in diskfs_protid_rele (arg=0x8149df0)
    at ../../libdiskfs/protid-rele.c:34
#6  0x8073653 in _ports_complete_deallocate (pi=0x8149df0)
    at ../../libports/complete-deallocate.c:49
#7  0x807292a in ports_port_deref (portstruct=0x8149df0)
    at ../../libports/port-deref.c:48
#8  0x8072e89 in internal_demuxer (inp=0x10dbf40, outheadp=0x10ddf40)
    at ../../libports/manage-multithread.c:104
#9  0x807639a in mach_msg_server_timeout ()
#10 0x8072f34 in thread_function (master=0)
    at ../../libports/manage-multithread.c:136
#11 0x8071aab in cthread_body (self=0x8133870)
    at ../../libthreads/cthreads.c:294
(gdb) btthr 13 4
[Switching to thread 14 (thread 87.14)]
#0  0x809b6fc in mach_msg_trap ()
(gdb) thr 14bt
#0  0x809b6fc in mach_msg_trap ()
#1  0x80760fd in mach_msg ()
#2  0x80763f3 in mach_msg_server_timeout ()
#3  0x8072f34 in thread_function (master=0)
    at ../../libports/manage-multithread.c:136
#4  0x8071aab in cthread_body (self=0x81333b8)
    at ../../libthreads/cthreads.c:294


[..snip..]


(gdb) thr 8
[Switching to thread 8 (thread 87.8)]
#0  0x809b6fc in mach_msg_trap ()
(gdb) bt
#0  0x809b6fc in mach_msg_trap ()
#1  0x80760fd in mach_msg ()
#2  0x8094474 in sleep ()
#3  0x804c7e6 in write_node (np=0x8154c20) at ../../ext2fs/inode.c:416
#4  0x804cbf1 in write_one_disknode (node=0x8154c20)
    at ../../ext2fs/inode.c:592
#5  0x804cb75 in diskfs_node_iterate (fun=0x108ff18)
    at ../../ext2fs/inode.c:568
#6  0x804ccb6 in write_all_disknodes () at ../../ext2fs/inode.c:599
#7  0x804ea90 in diskfs_sync_everything (wait=0) at ../../ext2fs/pager.c:925
#8  0x8052f44 in periodic_sync (interval=5)
    at ../../libdiskfs/sync-interval.c:120
#9  0x8071aab in cthread_body (self=0x812c328)
    at ../../libthreads/cthreads.c:294
(gdb) up 4    bt full
#0  0x809b6fc in mach_msg_trap ()
No symbol table info available.
#1  0x80760fd in mach_msg ()
No symbol table info available.
#2  0x8094474 in sleep ()
No symbol table info available.
#3  0x804c7e6 in write_node (np=0x8154c20) at ../../ext2fs/inode.c:416
        st = (struct stat *) 0x8154c2c
        di = (struct ext2_inode *) 0x30129180
#4  0x804cbf1 in write_one_disknode (node=0x8154c20)
    at ../../ext2fs/inode.c:592
        node = (struct node *) 0x8154c20
        di = (struct ext2_inode *) 0x10004005
#5  0x804cb75 in diskfs_node_iterate (fun=0x108ff18)
    at ../../ext2fs/inode.c:568
        err = 0
        n = 975031623
        num_nodes = 96
        node = (struct node *) 0x8154c20
        node_list = (struct node **) 0x108fd40
        p = (struct node **) 0x108fd50
#6  0x804ccb6 in write_all_disknodes () at ../../ext2fs/inode.c:599
No locals.
#7  0x804ea90 in diskfs_sync_everything (wait=0) at ../../ext2fs/pager.c:925
---Type <return> to continue, or q <return> to quit---
No locals.
#8  0x8052f44 in periodic_sync (interval=5)
    at ../../libdiskfs/sync-interval.c:120
        err = 975031623
        link = {thread = 58, next = 0x0, prevp = 0x812c0ec, notifies = 0x0, 
  interrupted_next = 0x0}
#9  0x8071aab in cthread_body (self=0x812c328)
    at ../../libthreads/cthreads.c:294
        t = 0x812c2b8
(gdb) up 4
#4  0x804cbf1 in write_one_disknode (node=0x8154c20)
    at ../../ext2fs/inode.c:592
592     ../../ext2fs/inode.c: No such file or directory.
(gdb) print *node
$1 = {next = 0x80fa9e8, prevp = 0x80fa9e8, dn = 0x814df88, dn_stat = {
    st_fstype = 23, st_fsid = 87, st_ino = 65540, st_gen = 975031812, 
    st_rdev = 0, st_mode = 33188, st_nlink = 1, st_uid = 0, st_gid = 1005, 
    st_size = 4765562, st_atime = 975031623, st_atime_usec = 60000, 
    st_mtime = 975031623, st_mtime_usec = 370000, st_ctime = 975031623, 
    st_ctime_usec = 370000, st_blksize = 8192, st_blocks = 9336, 
    st_author = 0, st_flags = 0, st_spare = {0 <repeats 11 times>}}, 
  dn_set_ctime = 14, dn_set_atime = 0, dn_set_mtime = 14, dn_stat_dirty = 14, 
  lock = {held = 1, lock = 0, name = 0x0, queue = {head = 0x8133870, 
      tail = 0x8138ad0}}, references = 3, light_references = 1, sockaddr = 0, 
  owner = 0, transbox = {active = 0, lock = 0x8154cbc, flags = 0, wakeup = {
      lock = 0, queue = {head = 0x0, tail = 0x0}, name = 0x0, 
      implications = 0x0}, cookie = 0x8154c20}, userlock = {type = 8, wait = {
      lock = 0, queue = {head = 0x0, tail = 0x0}, name = 0x0, 
      implications = 0x0}, waiting = 0, shcount = 0}, conch = {
    lock = 0x8154cbc, wait = {lock = 0, queue = {head = 0x0, tail = 0x0}, 
      name = 0x0, implications = 0x0}, holder = 0x0, 
    holder_shared_page = 0x0}, dirmod_reqs = 0x0, filemod_reqs = 0x0, 
  allocsize = 4767744, cache_id = 65540, author_tracks_uid = 0}


[..snip..]


(gdb) btthr 78 9
[Switching to thread 79 (thread 87.79)]
#0  0x809b6fc in mach_msg_trap ()
(gdb) thr 79bt
#0  0x809b6fc in mach_msg_trap ()
#1  0x80760fd in mach_msg ()
#2  0x80707c9 in cproc_block () at ../../libthreads/cprocs.c:665
#3  0x8070efc in __mutex_lock_solid (ptr=0x8154cbc)
    at ../../libthreads/cprocs.c:980
#4  0x804c1d5 in diskfs_cached_lookup (inum=65540, npp=0x14fbd84)
    at ../../ext2fs/inode.c:78
#5  0x8051550 in diskfs_check_lookup_cache (dir=0x8137a10, 
    name=0x14fdf75 "emacs") at ../../libdiskfs/name-cache.c:256
#6  0x8050f99 in diskfs_lookup (dp=0x8137a10, name=0x14fdf75 "emacs", 
    type=LOOKUP, np=0x14fbe34, ds=0x0, cred=0x8145580)
    at ../../libdiskfs/lookup.c:149
#7  0x805a1a1 in diskfs_S_dir_lookup (dircred=0x8145580, 
    path=0x14fdf5c "emacs20", flags=0, mode=0, retry=0x14fbf64, 
    retryname=0x14fbf6c "", returned_port=0x14fc370, 
    returned_port_poly=0x14fbe84) at ../../libdiskfs/dir-lookup.c:126
#8  0x80569ea in _Xdir_lookup (InHeadP=0x14fdf40, OutHeadP=0x14fbf40)
    at fsServer.c:1511
#9  0x8057bab in diskfs_fs_server (InHeadP=0x14fdf40, OutHeadP=0x14fbf40)
    at fsServer.c:2671
#10 0x8055a44 in diskfs_demuxer (inp=0x14fdf40, outp=0x14fbf40)
    at ../../libdiskfs/demuxer.c:32
#11 0x8072e6e in internal_demuxer (inp=0x14fdf40, outheadp=0x14fbf40)
    at ../../libports/manage-multithread.c:101
---Type <return> to continue, or q <return> to quit---
#12 0x807639a in mach_msg_server_timeout ()
#13 0x8072f34 in thread_function (master=0)
    at ../../libports/manage-multithread.c:136
#14 0x8071aab in cthread_body (self=0x8138ad0)
    at ../../libthreads/cthreads.c:294

Reply via email to