Haven't used the debug facility in mpich so cannot comment on that one. But can you just start the job as usual and then on the node that crashes use
gdb -pid XXX ? I tend to use lam and there you can use an 'application scheme' file which basically tells where to start what executable. You can use it to start a gdb session in a separate window for every process. Not recommendable for 32 processor jobs but excellent for development. A sample application scheme to start 'executableName' on two processors: xterm -e /bin/sh -c "gdb -command gdbCommands executableName 2>&1 | tee processor0.log; read dummy" xterm -e /bin/sh -c "gdb -command gdbCommands executableName 2>&1 | tee processor1.log; read dummy" The gbCommands file contains the gdb commands: run arg1 arg2 where It must be possible to do something similar with mpich. Mattijs On Monday 09 April 2007 18:30, Matt Funk wrote: > Hi, > > i hope this is the right mailing list to post > to... > > Anyway, i was wondering if i could get some > advice/direction on how to debug my mpich > program. I am running on a scyld configuration. > What i am trying right now is the following: > > mpirun -dbg=gdb -nolocal -np 32 exec > > which starts the debugger in which i go > run args > > which then start the program. However, it > doesn't get very far until it just sits there. > When i ps all the processes are defunced. > > When i do the same thing except mpirun -dbg=gdb > -nolocal -np 1 exec and run it in the debugger, > the program starts running well. > > The reason i want to run on 32 processor > though, is that it takes (on 32 procs) several > hours till my program crashes. Also, i would > like to be able to keep the conditions under > which it crashes intact as much as possible > (i.e. run on 32 procs rather than 1). > > Does anyone have any advice? I am open to try > out other things as well if possible. I am just > starting to learn debugger techniques for a > parallel program. > > thanks > mat > _______________________________________________ > Beowulf mailing list, [EMAIL PROTECTED] > To change your subscription (digest mode or > unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf -- Mattijs Janssens OpenCFD Ltd. 9 Albert Road, Caversham, Reading RG4 7AN. Tel: +44 (0)118 9471030 Email: [EMAIL PROTECTED] URL: http://www.OpenCFD.co.uk _______________________________________________ Beowulf mailing list, [EMAIL PROTECTED] To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf