On Sat, Jun 27, 2015 at 03:39:58PM +0100, James Clarke wrote: > I have been suffering a lot from my Hurd system (running in VirtualBox) > hanging at startup, just after "Hurd server bootstrap..." but before "INIT: > version 2.88 booting". > > I have been able to trace it back to getblk.c:248 (unsigned long > addr_per_block = EXT2_ADDR_PER_BLOCK (sblock);) in ext2_getblk. It faults > because sblock is NULL. > > I have traced the execution with debugging statements, and what seems to > happen is as follows: > > 1. diskfs_remount is called (because root is remounted as rw) > 2. RPCs are inhibited > 3. diskfs_reload_global_state is called > 4. sblock is set to NULL > 5. While this is happening, ext2_getblk is called > > If you’re lucky, the superblock is read and sblock is set to point to this > data before 5 (or at least before it gets to dereferencing sblock). If not, > sblock is still NULL and thus a page fault is raised, causing the system to > be stuck. > > Does anyone have an idea how this situation could be occurring?
My initial thought would be "how could it not happen ?". Despite diskfs_remount calling ports_inhibit_class_rpcs, other threads can very well be running to process previously received messages. There seems to be no other form of access synchronization such as locks in diskfs_reload_global_state. Can you get the call trace leading to ext2_getblk ? I'm not sure about backtrace(3) in static executables but it might be worth trying. -- Richard Braun