Haven't used the debug facility in mpich so cannot 
comment on that one. But can you just start the 
job as usual and then on the node that crashes 
use 

gdb -pid XXX

?

I tend to use lam and there you can use an  
'application scheme' file which basically tells 
where to start what executable. You can use it to 
start a gdb session in a separate window for 
every process. Not recommendable for 32 processor 
jobs but excellent for development.

A sample application scheme to start 
'executableName' on two processors:
 
xterm -e /bin/sh -c "gdb -command gdbCommands 
executableName 2>&1 | tee processor0.log; read 
dummy"
xterm -e /bin/sh -c "gdb -command gdbCommands 
executableName 2>&1 | tee processor1.log; read 
dummy"

The gbCommands file contains the gdb commands:

run arg1 arg2
where

It must be possible to do something similar with 
mpich.

Mattijs

On Monday 09 April 2007 18:30, Matt Funk wrote:
> Hi,
>
> i hope this is the right mailing list to post
> to...
>
> Anyway, i was wondering if i could get some
> advice/direction on how to debug my mpich
> program. I am running on a scyld configuration.
> What i am trying right now is the following:
>
> mpirun -dbg=gdb -nolocal -np 32 exec
>
> which starts the debugger in which i go
> run args
>
> which then start the program. However, it
> doesn't get very far until it just sits there.
> When i ps all the processes are defunced.
>
> When i do the same thing except mpirun -dbg=gdb
> -nolocal -np 1 exec and run it in the debugger,
> the program starts running well.
>
> The reason i want to run on 32 processor
> though, is that it takes (on 32 procs) several
> hours till my program crashes. Also, i would
> like to be able to keep the conditions under
> which it crashes intact as much as possible
> (i.e. run on 32 procs rather than 1).
>
> Does anyone have any advice? I am open to try
> out other things as well if possible. I am just
> starting to learn debugger techniques for a
> parallel program.
>
> thanks
> mat
> _______________________________________________
> Beowulf mailing list, [EMAIL PROTECTED]
> To change your subscription (digest mode or
> unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf

-- 

Mattijs Janssens

OpenCFD Ltd.
9 Albert Road,
Caversham,
Reading RG4 7AN.
Tel: +44 (0)118 9471030
Email: [EMAIL PROTECTED]
URL: http://www.OpenCFD.co.uk
_______________________________________________
Beowulf mailing list, [EMAIL PROTECTED]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to